Storage Optimization


Astonishing Capacity Gains

Posted in Analyst,Blogroll,Storage by storageoptimization on February 6, 2009
Tags: , , , , ,

Stephen Foskett had a nice post on his Packrat blog today that delves into the question of whether encryption can be done in such a way that it doesn’t interfere with compression. The whole post is worth a read. We were also pleased to see him describe Ocarina in the following manner:

“The software from Ocarina, for example, actually decompresses jpg and pdf files before recompressing them, resulting in astonishing capacity gains!”

The Packrat blog is on our RSS and Stephen is one of those bloggers who seems to have a grasp of just about everything that’s happening in storage–always adding his own fresh twist to the conversation. He’s also got a Twitter feed worth following, @sfoskett.

How to Cut Storage Costs – Taneja

The explosive growth of data is threatening to overwhelm any number of industries. Whether we’re talking about an online photo sharing site or high throughput gene sequencing lab, the pain is the same. There’s too much data and not enough space to store it on, with the result that costs are spiraling out of control. A recent white paper from the Taneja Group: “Extending the Vision for Primary Storage Optimization: Ocarina Networks” takes a look at the emerging capacity optimization technologies to handle this influx of data. It comes to the conclusion that ours is one of the most compelling technologies, being the only content-aware primary storage optimization (PSO) on the market today.

In its conclusion, the report states: “‘If you’re looking at PSO technology, Ocarina needs to be on your short list.”

Click here to access this report.

2009–the Year of Storage Optimization

Posted in Analyst by storageoptimization on January 28, 2009
Tags: ,
Storage consultant Tony Asaro cut straight to the chase on his HDS blog with his top prediction for 2009: “IT professionals will focus on optimization. I should end my blog right here. Nothing is more important this year.”

We couldn’t agree more, Tony. As data volumes grow – and budgets shrink –doing more with less is going to be the most important theme of storage for 2009 and the foreseeable future.

HDS is already recognized as the leader in many of the most important optimizations available in block storage. The next frontier is optimization in file storage. This includes content-aware compression and content-aware dedupe for online NAS, active archives, and content depots.

Being able to store two, 10, or 20 times more file data on a given amount of high performance virtualized HDS physical storage is not only now possible, but an example of vendor technology and user need intersecting at just the right time.

Are you Content Aware?

Posted in Analyst,Storage by storageoptimization on October 2, 2008
Tags: , ,
Storage analyst Robin Harris commented on the storage story of the week–NetApp’s Guarantee that virtualization will mean a 50% gain in storage capacity for its customers. 
Harris’s take on the announcement is that dedupe for primary storage could be “the next big win for IT shops.” Perhaps, but let’s keep in mind that NetApp dedupe is very simple. It only finds duplicate blocks at NetApp WAFL 4K block boundaries. The reason that they are positioning it as a big win for VMware users is that virtual machines (static images of whole operating systems) are exactly one of the few places where you’ll find lots of dupes in primary storage on block-aligned boundaries.
Here is my take: The best results in dedupe for primary storage are going to be from applications that can recognize file types and understand how to find the duplicate information in them. That is where the big wins in dedupe for primary storage are going to be.
Consider this typical scenario: I create a PowerPoint and email it to someone else. They save it, open it, and make an edit – add a slide, or even just edit a bullet or two. That small edit will mean that none of the redundant content of that file falls on the same NetApp WAFL block boundaries. So although the two files are almost entirely the same, you won’t see good dedupe results on them.
A content-aware solution – which combines both information-level dedupe with content-aware compression – should be able to get 10:1 compression on most typical file mixes (especially those Office and engineering ones). A 10:1 ratio is the same as 90% reduction, so if you can shrink 80% of your data by 90%, so can get a pretty good handle on how big the win could be. And by the way, It’s not necessarily a bad thing for the guys who sell disks, either, because what happens when you can get that kind of win is that you start to think differently about what you can store, and how long you can store it for. For example, at my company Ocarina Networks (www.ocarinanetworks.com), we have a customer that plans to store a snapshot a day online for every day’s data for 10 years. That wouldn’t be possible without some drastic deduplication.
Block level dedupe – whether simple block-aligned like NetApp or sliding window like market leader Data Domain – is only going to find a small subset of the duplicate or redundant information in primary storage. That’s because most file types that drive storage growth in primary (or nearline) storage are compressed. Compression will cause the contents of a file to be recomputed – and to look random – every time a file is changed. So if I store a photo, then open it and edit one pixel and save the new version as a new file, there won’t be a single duplicate block at the disk level. On the other hand, almost the entire file is duplicate information.    
Can you find a duplicate graphic that was used in a Powerpoint, a Word document, and a PDF? Powerpoint and Word both compress with a variant of zip; PDF compressed with deflate. Even if the graphic is identical, block level dedupe won’t find the duplicate graphics because they are not stored identically on disk. You need something that can find duplicate data at the information level. Finally, there are pretty concrete data that say that about 80% of the file data on NAS is a candidate for deduplication.
With all that in mind, don’t you think content aware optimization is going to be the next truly big win?

Nice to be quoted/mentioned

Posted in Analyst,Storage by storageoptimization on June 30, 2008

It’s not always clear whether I’m making an impact, but this past week was one of those times when I realized that some others are taking note of the excitement around and importance I attach to the concept of storage optimization. In a June 27 editorial article in Processor, “Doing More With Less,” I was quoted in a section on “Saving Space” as follows:

“Carter George notes that, ‘… storage optimization is the key technology for utilization of the space needed for data storage. By using this technology, users can shrink existing files by as much as 90%, thus enabling the storage of up to 10 times more data on disks already owned by the enterprise.'” 

That same day, my company Ocarina Networks earned a mention in a post on Jon Toigo’s excellent Drunken Data blog. In a post recalling a conversation with Chris Santilli of Copan Systems, he writes: 

“Chris noted that de-duplication technology was past the hype stage (not sure about that one) but that the technology was still undergoing substantial development — rather like compression in its early days:  a lot of variations, no standards.  He further noted that some interesting work was being done by companies such as Ocarina on improved file type awareness that might help mitigate some nagging technical issues involving de-dupe of data on disks that had been defragged.  (Lot’s of “D’s” in that sentence.)”

Thanks guys. Good to know I can get the word out to the very folks who really know and understand what’s going on.

 

What to do about the coming video explosion

Posted in Analyst,Storage,Video by storageoptimization on June 4, 2008

Pete Steege’s Storage Effect is commenting today on an ABI report that highlights the explosion of video content on the web, which expected to increase to one billion viewers by 2013. Steege’s response is that the report ignores the “digital home,” which will no doubt become ubiquitous in the coming years.

I agree, and would add that there are still other things driving video storage growth as well, such as a drastic increase in the number of video surveillance cameras and their resolution. But mainly, what I see is that the storage problem itself could actually be solved to a great extent with the proper optimization. For video, since video files are already compressed for transmission, the proper storage optimization has to include both video-specific recompression and video-specific deduplication.

For video on the internet, you have two related but different problems. One is to store the vast amount of content that is being generated. The second is provide the bandwidth needed for high-definition viewing of hot content.    

Most video content is not hot. People upload thousands of hours of video per day to popular sites like YouTube, but only a small fraction of that gets wide viewership. It all needs to be stored, but the key thing for most of it is to store it cheaply. That’s going to mean not just cheap disks, but video-specific storage optimization that greatly reduces the size of the video files.     

The relatively few videos (meaning, a couple hundred a day) that do become popular won’t be so aggressively compressed, or they’ll be compressed for bandwidth rather than for storage optimization. That is, solving the speed problem for the hot stuff that everyone is watching is easy – it will be replicated and cached, and people will get access to their hot shows and user-contributed videos.  Solving the “store 900 Petabytes of user-generated video really cheaply” problem is not so easy to solve.

Another major optimization of video storage is that most videos that most people want access to is duplicated across many homes. Today, a blockbuster movie, a hit TV show, a TiVo of the big game – these are all stored hundreds of thousands of times across millions of households.    

As video storage moves to cloud storage services, a lot of that can be deduplicated. For entire licensed content (e.g., a studio movie) that’s relatively easy – you’d say, here are 10,000,000 users uploading their copy of the Lion King…let’s just save one.  But to get real optimization, cloud storage providers are going to want to be able to find and compress video at finer granularity than that.  Let’s say there’s a football game broadcast on ABC in some markets, and carried by ESPN (with different commercials) in another market.  User A records it in standard def.  User B records it in high def.  The user in Atlanta records it from ABC.  The user in Portland records in from ESPN. To be efficient, you’ll want storage optimization that recognizes that those users are all uploading versions of the same thing, and takes out the redundant information as part of the compression / deduplication process.   

Without aggressive storage optimization – including video-specific compression and dedupe – the explosive growth of video content is going to overwhelm storage capability.

Saturated: The Cloud’s Storage Dilemma

Posted in Analyst,Featured,Storage by storageoptimization on May 22, 2008

Yesterday’s Mashable post looking at online file storage providers caught my eye. Right now, online “cloud” storage providers are all targeting different markets, but the competition is fierce in all segments. Some are going after the consumer – such as AOL X-Drive -, some are going after online backup, and some are going after web site data. Actually, that article doesn’t even mention Amazon’s S3, for example, which is a huge online repository.

The obvious benefits are basically twofold: ease-of-use and – for most of them – the fact that they manage your data for you in terms of backing it up, replicating it, etc. The biggest drawback here is that you have to be connected to a network to get to your files.


Most customers will still look at cost/Gigabyte as the main motivator to use a service like this and, at the right price and benefit point, people will put their files online. Since all these storage service providers all buy their disks from the same small number of companies that actually make disk drives, the costs are all roughly the same for the physical infrastructure needed to build an online storage service and compete.

I think that the real solution here is that, for anyone to breakthrough and get some separation from the crowd, they are going to have to incorporate breakthrough storage optimization in their offering – and do so in a way that’s transparent to the end user. That could be dedupe, that could be compression, or it could be something more sophisticated like Ocarina. The main thing is that if you can get 5:1 or 10:1 ratios on how much logical space you can provide via the cloud to how much physical space you, as a provider, have to buy, then you can have a compelling proposition. The competition is fierce in this market and in order to grow and thrive in any business that offers online storage, the providers are going to have to develop a strategy to significantly increase their online storage capacity without increasing cost and overhead in step.

Less is More

Posted in Analyst,Featured,File Systems,Storage by storageoptimization on April 24, 2008
Tags: , ,

Less is more … or is it? Part One

I recently returned from Storage Networking World in Orlando. As everyone knows, the conference is mainly a place for storage vendors to meet each other, tout their wares, and nose around in their competitors’ booths pretending to be potential customers. There are some good sessions, however, and one of the best was IDC analyst Noemi Greyzdorf’s presentation on the future of file systems.

Her smart and interesting talk was on the evolution of clustered, distributed, and grid file systems. As I listened, it occurred to me that I’m seeing a big split in the file system world, especially at the high end, where really large amounts of data are stored.

One of Noemi’s key points is that more and more functionality is being packed into file systems. As she puts it, file systems are the natural place for value-add knowledge about storage to be kept. That’s certainly true, and there are a number of advanced file systems that are becoming richer and richer in terms of integrated features.

At the same time, there is definitely a “less is more” crowd emerging, where many of the most basic features of file systems are being left out in some of the newest large-scale file systems around. This group includes file systems like GoogleFS, Hadoop, Mogile, Amazon’s S3 simple storage service, and the in-house developments at a couple of other very large online web 2.0 shops.

Are these two trends in file systems headed on a collision course? I don’t think so. But what I do see is that neither of these solutions is nailing the growing problem posed by the exploding amount of internet data that needs to be managed and stored. In other words, there are issues with both of these approaches. In my next entry, I will discuss what that is, and how we might solve it.