Storage Optimization

Compression – A Matter of Life and Death

Posted in Featured,Storage by storageoptimization on February 9, 2009
Tags: , , , , , ,


Nice piece today in Bioinform about our compression solution for genomics data. Carter George of Ocarina spoke to the author of the piece, Vivien Marx, last week, as did Dave Lifka at Cornell. The article details the work we’re doing with Cornell University’s Center for Advanced Computing (CAC) in partnership with DataDirect to increase their capacity by up to 90 percent.

Gene sequencing has opened up new vistas in medical research that could lead to a completely new era of “personalized medicine,” with targeted treatments and few or no side effects from medications. Momentum for this type of medicine is building–the FDA announced today that it has created a new position dedicated to “coordinating and upgrading” the agency’s involvement in genomics and other elements of personalized medicine.

The potential is huge, and it’s truly horrifying to think that all this progress could be slowed or stopped due to the cost of storage. Thus, freeing up disk space truly can be a matter of life and death.

We’ve addressed this by developing compression solutions specifically designed for sequencing technologies such as those from Illumina and Affymetrix.  The Bioinform article offers significant detail on the types of files we compress as well as the checksums Ocarina performs on each before any shadow files are deleted. We hope you’ll take a look at the piece.


Astonishing Capacity Gains

Posted in Analyst,Blogroll,Storage by storageoptimization on February 6, 2009
Tags: , , , , ,

Stephen Foskett had a nice post on his Packrat blog today that delves into the question of whether encryption can be done in such a way that it doesn’t interfere with compression. The whole post is worth a read. We were also pleased to see him describe Ocarina in the following manner:

“The software from Ocarina, for example, actually decompresses jpg and pdf files before recompressing them, resulting in astonishing capacity gains!”

The Packrat blog is on our RSS and Stephen is one of those bloggers who seems to have a grasp of just about everything that’s happening in storage–always adding his own fresh twist to the conversation. He’s also got a Twitter feed worth following, @sfoskett.

Test Your Storage Optimization IQ

Posted in Storage by storageoptimization on February 5, 2009
Tags: , , , ,

Here’s a quick quiz to see how smart you are about primary storage optimization:

1. True or false: the only type of deduplication on the market today is block level deduplication–the type that looks at the zeros and ones on disk, and removes the duplicates.

2. Content aware deduplication is:

a) More effective than other types of optimization for primary storage;

b) The best approach to optimizing online files, such as photos, PDFs, and other already compressed files because it extracts them and reads them in their non-compressed format before optimizing them;

c) Only available from Ocarina Networks;

d) All of the above.

3. True or false: dedupe gets 20:1 data reduction results the first time it passes through your data.

4. With online data sets, block level dedupe and content aware dedupe get:

a) About the same results;

b) Different results–block level is better;

c) Radically different results–Ocarina’s content aware deduplication solution gets 5x or better results than block level dedupe.


1. FALSE. There’s a new type of dedupe on the storage scene–content aware dedupe. This works in part by analyzing the ones and zeros in files that have been extracted out of their compressed format–a far more effective approach for the types of files that are driving storage growth, such as images, PDFs, and Windows files. More info. at:

2. d-all of the above.

3. FALSE: Block level dedupe gets its results because of the repetitive nature of backups – daily backups create dupes, dedupe takes them back out. For online data sets, you won’t get those results, because it’s not a repetitive data set.  You need a different approach that can find the dedupe and compression opportunities in a single online set of files.

4. c–see the chart below for a comparison of results.


How to Cut Storage Costs – Taneja

The explosive growth of data is threatening to overwhelm any number of industries. Whether we’re talking about an online photo sharing site or high throughput gene sequencing lab, the pain is the same. There’s too much data and not enough space to store it on, with the result that costs are spiraling out of control. A recent white paper from the Taneja Group: “Extending the Vision for Primary Storage Optimization: Ocarina Networks” takes a look at the emerging capacity optimization technologies to handle this influx of data. It comes to the conclusion that ours is one of the most compelling technologies, being the only content-aware primary storage optimization (PSO) on the market today.

In its conclusion, the report states: “‘If you’re looking at PSO technology, Ocarina needs to be on your short list.”

Click here to access this report.

Storage Optimization – The Trend Picks Up

Posted in Storage by storageoptimization on January 26, 2009
Tags: , , , , , ,
Several news articles in the past week are responding to reports about the continued skyrocketing growth of unstructured data, and the technologies that are coming up to meet this new set of demands under today’s economic circumstances. 
Here are a few of the articles that jumped out at us:
As we’ve often mentioned, a combination of solutions is called for when it comes to capacity optimization, one of which is content aware compression, such as that offered by my company Ocarina Networks. Given the state of the economy and everyone’s focus on cost savings, we have no doubt that this trend will pick up in 2009 –dealing with the costs of growing data by having it take 90% less space to actually store is a win-win all around.

Can You Compress Already Compressed Files? Part II

Posted in Featured,File Systems,Storage by storageoptimization on May 6, 2008

In my last post I discussed the fact that most files that are used are already compressed. And up to now, there were no algorithms to further compress them. Yet, it’s obvious that there needs to be a new solution.

On the cutting edge, there are some new innovations in file-aware optimization that allow companies to reduce their storage footprint and get more from the storage they already have. The key to this is understanding specific file types, their formats, and how the applications that created those files use and save data. Most existing compression tools are generic. To get better results than you can get with a generic compressor, you need to go to file-type-aware compressors.

There’s another problem. Let’s say you just created a way better tool for compressing photographs than JPEG. That doesn’t mean your tool can compress already-compressed JPEGs, it means that if you were given the same original photo in the first place, you could do a better job. So the first step in moving towards compressing already-compressed files is what we call Extraction – you have to extract the original full information from the file. In most cases, that’s going to involve de-compressing the file first, getting back to the uncompressed original, and then applying your better tools.

Extraction may seem simple enough – just reverse whatever was done to a file in the first place. But it’s not always quite that easy. Many files are compound documents, with multiple sections or objects of different data types. A PowerPoint presentation, for example, may have text sections, graphics sections, some photos pasted in, etc. The same is true for PDFs, email folders with attachments, and a lot of the other file types that are driving storage growth. So to really extract all the original information from these files, you may need to not only be able to decompress files, but to look inside them, understand how they are structured, break them apart in to their separate pieces, and then do different things to each different piece.

The two things to take away from this discussion are: 1) you won’t get much benefit from applying generic compression to already-compressed file types, which are the file types that are driving most of your storage growth and 2) it is possible to compress already-compressed files, but to do so, you have to first extract all the original information from them, which may involve decoding and unraveling complex compound documents and then decompressing all the different parts. Once you’ve gotten to that point, you’re just at the starting point for where online data reduction can really get started for today’s file types.

Can you compress an already compressed file? Part I

Posted in Featured,File Systems,Storage by storageoptimization on May 1, 2008

We can all recognize the amount of data we generate. And just like we keep telling ourselves we’ll clean out the garage “one of these days” most of us rarely bother to clean out our email or photo sharing accounts.

As a result, enterprise and internet data centers have to buy hundreds of thousands of petabytes of disk every year to handle all the data in those files. It all has to be stored somewhere.

One way to reduce the amount of storage growth is to compress files. Compression techniques have been around forever, and are built in to many operating systems (like Windows) and storage platforms (such as file servers).

Here’s the problem: most modern file formats, the formats driving all this storage growth, are already compressed.
· The most common format for photos is JPEG – that’s a compressed image format.
· The most common format for most documents at work is Microsoft Office, and in Office 2007, all Office documents are compressed as they are saved.
· Music (mp3) and video (MPEG-2 and MPEG-4) are highly compressed.

The mathematics of compression are that once you compress a file, and reduce its size, you can’t expect to be able to compress it again and get even more size reduction. The way compression works is that it looks for patterns in the data, and if it finds patterns it replaces them with more efficient codes. So if you’ve compressed something once, the compressed file shouldn’t have any patterns in it.

Of course, some compression algorithms are better than others, and you might see some small benefits by trying to compress something that has already been compressed with a lesser tool, but for the most part, you’re not going to see a big win by doing that. In fact, in a lot of cases, trying to compress an already compressed file will make it bigger!
Conventional wisdom dictates that once files are compressed via commonly used technologies, the ability to further limit their size and consumption of expensive resources is nearly impossible. So, what can be done about this?