Storage Optimization


Nice to be quoted/mentioned

Posted in Analyst, Storage by storageoptimization on the June 30, 2008

It’s not always clear whether I’m making an impact, but this past week was one of those times when I realized that some others are taking note of the excitement around and importance I attach to the concept of storage optimization. In a June 27 editorial article in Processor, “Doing More With Less,” I was quoted in a section on “Saving Space” as follows:

“Carter George notes that, ‘… storage optimization is the key technology for utilization of the space needed for data storage. By using this technology, users can shrink existing files by as much as 90%, thus enabling the storage of up to 10 times more data on disks already owned by the enterprise.’” 

That same day, my company Ocarina Networks earned a mention in a post on Jon Toigo’s excellent Drunken Data blog. In a post recalling a conversation with Chris Santilli of Copan Systems, he writes: 

“Chris noted that de-duplication technology was past the hype stage (not sure about that one) but that the technology was still undergoing substantial development — rather like compression in its early days:  a lot of variations, no standards.  He further noted that some interesting work was being done by companies such as Ocarina on improved file type awareness that might help mitigate some nagging technical issues involving de-dupe of data on disks that had been defragged.  (Lot’s of “D’s” in that sentence.)”

Thanks guys. Good to know I can get the word out to the very folks who really know and understand what’s going on.

 

Commenting on Compression

Posted in Uncategorized by storageoptimization on the June 23, 2008

On Storage Soup, the TechTarget Storage Blog, Tory Skyers wrote a really interesting post on “Compression, Dedupe and the Law” last week that I felt compelled to comment on. He raised a question about what dedupe could mean from a legal standpoint, considering that the data is altered when it goes through this process. 

My response, which you are welcome to read in detail on the site, is to point out one issue Tory missed. That is, that in-band compression is scary for the reasons he outlines, and fortunately, it’s not the only option these days.

The other comments posted are well worth the read as well. Good to hear people debating these issues.

What to do about the coming video explosion

Posted in Analyst, Storage, Video by storageoptimization on the June 4, 2008

Pete Steege’s Storage Effect is commenting today on an ABI report that highlights the explosion of video content on the web, which expected to increase to one billion viewers by 2013. Steege’s response is that the report ignores the “digital home,” which will no doubt become ubiquitous in the coming years.

I agree, and would add that there are still other things driving video storage growth as well, such as a drastic increase in the number of video surveillance cameras and their resolution. But mainly, what I see is that the storage problem itself could actually be solved to a great extent with the proper optimization. For video, since video files are already compressed for transmission, the proper storage optimization has to include both video-specific recompression and video-specific deduplication.

For video on the internet, you have two related but different problems. One is to store the vast amount of content that is being generated. The second is provide the bandwidth needed for high-definition viewing of hot content.    

Most video content is not hot. People upload thousands of hours of video per day to popular sites like YouTube, but only a small fraction of that gets wide viewership. It all needs to be stored, but the key thing for most of it is to store it cheaply. That’s going to mean not just cheap disks, but video-specific storage optimization that greatly reduces the size of the video files.     

The relatively few videos (meaning, a couple hundred a day) that do become popular won’t be so aggressively compressed, or they’ll be compressed for bandwidth rather than for storage optimization. That is, solving the speed problem for the hot stuff that everyone is watching is easy – it will be replicated and cached, and people will get access to their hot shows and user-contributed videos.  Solving the “store 900 Petabytes of user-generated video really cheaply” problem is not so easy to solve.

Another major optimization of video storage is that most videos that most people want access to is duplicated across many homes. Today, a blockbuster movie, a hit TV show, a TiVo of the big game – these are all stored hundreds of thousands of times across millions of households.    

As video storage moves to cloud storage services, a lot of that can be deduplicated. For entire licensed content (e.g., a studio movie) that’s relatively easy – you’d say, here are 10,000,000 users uploading their copy of the Lion King…let’s just save one.  But to get real optimization, cloud storage providers are going to want to be able to find and compress video at finer granularity than that.  Let’s say there’s a football game broadcast on ABC in some markets, and carried by ESPN (with different commercials) in another market.  User A records it in standard def.  User B records it in high def.  The user in Atlanta records it from ABC.  The user in Portland records in from ESPN. To be efficient, you’ll want storage optimization that recognizes that those users are all uploading versions of the same thing, and takes out the redundant information as part of the compression / deduplication process.   

Without aggressive storage optimization – including video-specific compression and dedupe – the explosive growth of video content is going to overwhelm storage capability.

Saturated: The Cloud’s Storage Dilemma

Posted in Analyst, Featured, Storage by storageoptimization on the May 22, 2008

Yesterday’s Mashable post looking at online file storage providers caught my eye. Right now, online “cloud” storage providers are all targeting different markets, but the competition is fierce in all segments. Some are going after the consumer - such as AOL X-Drive -, some are going after online backup, and some are going after web site data. Actually, that article doesn’t even mention Amazon’s S3, for example, which is a huge online repository.

The obvious benefits are basically twofold: ease-of-use and – for most of them – the fact that they manage your data for you in terms of backing it up, replicating it, etc. The biggest drawback here is that you have to be connected to a network to get to your files.


Most customers will still look at cost/Gigabyte as the main motivator to use a service like this and, at the right price and benefit point, people will put their files online. Since all these storage service providers all buy their disks from the same small number of companies that actually make disk drives, the costs are all roughly the same for the physical infrastructure needed to build an online storage service and compete.

I think that the real solution here is that, for anyone to breakthrough and get some separation from the crowd, they are going to have to incorporate breakthrough storage optimization in their offering – and do so in a way that’s transparent to the end user. That could be dedupe, that could be compression, or it could be something more sophisticated like Ocarina. The main thing is that if you can get 5:1 or 10:1 ratios on how much logical space you can provide via the cloud to how much physical space you, as a provider, have to buy, then you can have a compelling proposition. The competition is fierce in this market and in order to grow and thrive in any business that offers online storage, the providers are going to have to develop a strategy to significantly increase their online storage capacity without increasing cost and overhead in step.

Who’s Really Melting the Ice Cap?

Posted in Featured, Storage by storageoptimization on the May 21, 2008
Jon William Toigo’s blog “Drunken Data,” which has fun with its headlines, has a post titled “Climate Change or Silly Season?” In it, he references an article that ran in Macworld UK stating that Apple Computer–that darling of uberyuppies and designers–has been rated as a contributor to global warming.
Credit Flickr User Tom\'s Caps
Toigo’s response: “While I agree that the company generates a lot of hot air, the truth is that storage hardware, not PCs/MACs/servers, is the big power pig. Behind it all is a total mismanagement of data. Think about naming your files better and deploying archive technology the next time you see that video of a chunk of ice breaking off from a glacier.”
In truth, servers (and computers in general) give off a lot more heat per unit of rack space than storage. Processors that are running full out generate a lot of heat, and consume a lot of power. At the same time, both individuals and corporations have a high ratio of storage to servers, so if you add it all up, it might be the case that a data center uses as much power for storage as it does for servers.
That being said, I don’t think “naming your files better” is going to turn out to be the answer. Some combination of thin provisioning (waste less free space) and storage optimization (store things efficiently, along the lines that virtual machines use CPU efficiently) is the direction that things are headed.
The key thing to keep in mind is: are the servers and storage being used efficiently? In the server arena, virtualization has turned out to be the magic answer – allowing data centers to consolidate multiple logical servers on to one physical one to make sure each physical server is being used efficiently, and that a lot of idle servers aren’t wasting power, rackspace and cooling.
In short, I think that “storage optimization” is to making storage more efficient what “server virtualization” was to making servers more efficient.

Information Week on Storage Optimization

Posted in Uncategorized by storageoptimization on the May 19, 2008

Last week, Storage Switzerland Analyst George Crump penned a guest column for Information Week about the optimization of primary storage, a topic close to our hearts.

The article looks at the various approaches to online storage optimization and how these solutions can help reduce the footprint of online data and help companies effectively respond to the massive increase in information that lives and is accessed via the Internet.

George is working on a series of articles looking at vendors in the space, for the first installment on Ocarina Networks click here.

Disclosure, I am a co-founder of Ocarina.

The New Storage Cost Metric: Petabytes/Admin

Posted in HP, Storage by storageoptimization on the May 13, 2008

Some Thoughts on HP’s New Massively Scalable NAS

Last week HP announced a new massively scalable NAS solution they called Extreme Storage (ExDS)

As Judy Mottl at InternetNews.com notes, HP’s offering is interesting in several ways, including a very aggressive list price of under $2 per Gigabyte, and the fact that performance and amount of storage scale separately.
One of the most interesting things about ExDS, though, is that HP is talking about a metric for measuring storage cost that I haven’t seen much discussion of to date – Petabytes/admin.
In other words, HP is putting a focus on making it easy to deploy and manage a huge amount of storage with one person.

Now, in some ways, this is just a twist on ease of use, but ease of use at scale is different than ease of use on a small filer. For example, Network Appliance has always done a great job of making it easy to configure and deploy a filer. However, making a filer with 5 Terabytes easy to deploy doesn’t make it easy to deploy and manage 10 Petabytes of storage – which is the situation a lot of customers are finding themselves in – especially in key growth areas like web 2.0, social media, online email, and rich media.
If you have to deploy 50 or 100 filers to get to scale, the fact that each one was easy pales in comparison to having to do it 100 times, and then monitor and manage 100 standalone puddles of storage.
At the rate that unstructured storage is growing managing massive amounts of storage with a small number of storage admins is going to be increasingly important.

It looks like HP has done a good job of this – simplifying the whole stack for the storage admin, from the lowest hardware level all the way up through provisioning, deployment, and what users see. If so, then the metric Petabytes/Admin will become one of the most important metrics in comparing scalable storage solutions. Of course, when you add storage optimization to the mix, the $/Gigabyte go down and the Petabytes/Admin go up.
As Mary Jander at Byte and Switch comments, with EMC’s forthcoming Hulk/Maui combination, and IBM’s purchase of scalable storage solution XIV, massively scalable storage is shaping up to be a major battlefront. As that battle takes shape, look for the P/A ratio to be a key measure to watch. Kudo’s to HP for bringing it to the forefront.

Finding us just got easier

Posted in Storage by storageoptimization on the May 7, 2008
Tags: , ,

It’s always hard to know how to find the best and more interesting blogs. When it comes to those that–like this one–are tech focused, the possibilities are nearly endless. That’s why we’re glad to announce that Storage Optimization is now part of FindTechBlogs, a blog aggregator that provides an easy way to browse for quality tech blog content.

Several of this blog’s recent posts are featured on the front page of the site, and we’re also listed on the blogroll. We join such as Kevin Epstein at the Scalent Systems - Next Generation Data Center Virtualization blog and Ken Oestreich’s Fountainhead blog, which has some posts of relevance this week about the Uptime Institute’s Green Enterprise Computing Symposium. Here’s hoping this is one more way that we can join the community of bloggers that are talking about what matters in the IT and storage worlds.

Can You Compress Already Compressed Files? Part II

Posted in Featured, File Systems, Storage by storageoptimization on the May 6, 2008
Tags:

In my last post I discussed the fact that most files that are used are already compressed. And up to now, there were no algorithms to further compress them. Yet, it’s obvious that there needs to be a new solution.

On the cutting edge, there are some new innovations in file-aware optimization that allow companies to reduce their storage footprint and get more from the storage they already have. The key to this is understanding specific file types, their formats, and how the applications that created those files use and save data. Most existing compression tools are generic. To get better results than you can get with a generic compressor, you need to go to file-type-aware compressors.

There’s another problem. Let’s say you just created a way better tool for compressing photographs than JPEG. That doesn’t mean your tool can compress already-compressed JPEGs, it means that if you were given the same original photo in the first place, you could do a better job. So the first step in moving towards compressing already-compressed files is what we call Extraction – you have to extract the original full information from the file. In most cases, that’s going to involve de-compressing the file first, getting back to the uncompressed original, and then applying your better tools.

Extraction may seem simple enough – just reverse whatever was done to a file in the first place. But it’s not always quite that easy. Many files are compound documents, with multiple sections or objects of different data types. A PowerPoint presentation, for example, may have text sections, graphics sections, some photos pasted in, etc. The same is true for PDFs, email folders with attachments, and a lot of the other file types that are driving storage growth. So to really extract all the original information from these files, you may need to not only be able to decompress files, but to look inside them, understand how they are structured, break them apart in to their separate pieces, and then do different things to each different piece.

The two things to take away from this discussion are: 1) you won’t get much benefit from applying generic compression to already-compressed file types, which are the file types that are driving most of your storage growth and 2) it is possible to compress already-compressed files, but to do so, you have to first extract all the original information from them, which may involve decoding and unraveling complex compound documents and then decompressing all the different parts. Once you’ve gotten to that point, you’re just at the starting point for where online data reduction can really get started for today’s file types.

Can you compress an already compressed file? Part I

Posted in Featured, File Systems, Storage by storageoptimization on the May 1, 2008
Tags:

We can all recognize the amount of data we generate. And just like we keep telling ourselves we’ll clean out the garage “one of these days” most of us rarely bother to clean out our email or photo sharing accounts.

As a result, enterprise and internet data centers have to buy hundreds of thousands of petabytes of disk every year to handle all the data in those files. It all has to be stored somewhere.

One way to reduce the amount of storage growth is to compress files. Compression techniques have been around forever, and are built in to many operating systems (like Windows) and storage platforms (such as file servers).

Here’s the problem: most modern file formats, the formats driving all this storage growth, are already compressed.
· The most common format for photos is JPEG – that’s a compressed image format.
· The most common format for most documents at work is Microsoft Office, and in Office 2007, all Office documents are compressed as they are saved.
· Music (mp3) and video (MPEG-2 and MPEG-4) are highly compressed.

The mathematics of compression are that once you compress a file, and reduce its size, you can’t expect to be able to compress it again and get even more size reduction. The way compression works is that it looks for patterns in the data, and if it finds patterns it replaces them with more efficient codes. So if you’ve compressed something once, the compressed file shouldn’t have any patterns in it.

Of course, some compression algorithms are better than others, and you might see some small benefits by trying to compress something that has already been compressed with a lesser tool, but for the most part, you’re not going to see a big win by doing that. In fact, in a lot of cases, trying to compress an already compressed file will make it bigger!
Conventional wisdom dictates that once files are compressed via commonly used technologies, the ability to further limit their size and consumption of expensive resources is nearly impossible. So, what can be done about this?

Next Page »