Storage Optimization

Economic Woes and Storage

Posted in Storage by storageoptimization on October 6, 2008
Tags: , , , , , ,

Every major business magazine has a cover story this week on the economic turmoil that’s gripping the credit markets, Wall Street, and the rest of us. Every one, that is, except Forbes, which chose to put John Chambers, CEO of Cisco on its cover this week. No doubt, some editors over there are wishing they’d made a different choice at this moment–but leaving that aside, in some ways this story says more about the economy than any of the others.

Cisco, the article demonstrates, has jumped in with both feet into the area with the greatest promise: data centers. This unglamorous chunk of reality that underlies all the fun and fancy Web 2.0 that, for now, is keeping Silicon Valley from tanking along with the rest of the economy. (Unless you believe the NYT, of course.)

To quote the Forbes article: “This is what the online computing revolution has become, a giant electricity hog of Internet searches, phone calls, blog posts, wireless downloads, bank transactions and office documents. And video, lots and lots of video. ” The article also includes a chart comparing new server spending v. power and cooling costs.

All of which leads us to the inexorable conclusion–which TechTarget’s Dave Raffo refers to in a recent post–that one of the few places that is sheltered from the current storm is anything that reduces the cost of storage. So yes, storage optimization is the place to be in today’s tough economic climate. But the main point is that it could help keep lots of companies afloat that might otherwise crumple under the weight of their storage costs.

Are you Content Aware?

Posted in Analyst,Storage by storageoptimization on October 2, 2008
Tags: , ,
Storage analyst Robin Harris commented on the storage story of the week–NetApp’s Guarantee that virtualization will mean a 50% gain in storage capacity for its customers. 
Harris’s take on the announcement is that dedupe for primary storage could be “the next big win for IT shops.” Perhaps, but let’s keep in mind that NetApp dedupe is very simple. It only finds duplicate blocks at NetApp WAFL 4K block boundaries. The reason that they are positioning it as a big win for VMware users is that virtual machines (static images of whole operating systems) are exactly one of the few places where you’ll find lots of dupes in primary storage on block-aligned boundaries.
Here is my take: The best results in dedupe for primary storage are going to be from applications that can recognize file types and understand how to find the duplicate information in them. That is where the big wins in dedupe for primary storage are going to be.
Consider this typical scenario: I create a PowerPoint and email it to someone else. They save it, open it, and make an edit – add a slide, or even just edit a bullet or two. That small edit will mean that none of the redundant content of that file falls on the same NetApp WAFL block boundaries. So although the two files are almost entirely the same, you won’t see good dedupe results on them.
A content-aware solution – which combines both information-level dedupe with content-aware compression – should be able to get 10:1 compression on most typical file mixes (especially those Office and engineering ones). A 10:1 ratio is the same as 90% reduction, so if you can shrink 80% of your data by 90%, so can get a pretty good handle on how big the win could be. And by the way, It’s not necessarily a bad thing for the guys who sell disks, either, because what happens when you can get that kind of win is that you start to think differently about what you can store, and how long you can store it for. For example, at my company Ocarina Networks (, we have a customer that plans to store a snapshot a day online for every day’s data for 10 years. That wouldn’t be possible without some drastic deduplication.
Block level dedupe – whether simple block-aligned like NetApp or sliding window like market leader Data Domain – is only going to find a small subset of the duplicate or redundant information in primary storage. That’s because most file types that drive storage growth in primary (or nearline) storage are compressed. Compression will cause the contents of a file to be recomputed – and to look random – every time a file is changed. So if I store a photo, then open it and edit one pixel and save the new version as a new file, there won’t be a single duplicate block at the disk level. On the other hand, almost the entire file is duplicate information.    
Can you find a duplicate graphic that was used in a Powerpoint, a Word document, and a PDF? Powerpoint and Word both compress with a variant of zip; PDF compressed with deflate. Even if the graphic is identical, block level dedupe won’t find the duplicate graphics because they are not stored identically on disk. You need something that can find duplicate data at the information level. Finally, there are pretty concrete data that say that about 80% of the file data on NAS is a candidate for deduplication.
With all that in mind, don’t you think content aware optimization is going to be the next truly big win?