Storage Optimization


Saturated: The Cloud’s Storage Dilemma

Posted in Analyst,Featured,Storage by storageoptimization on May 22, 2008

Yesterday’s Mashable post looking at online file storage providers caught my eye. Right now, online “cloud” storage providers are all targeting different markets, but the competition is fierce in all segments. Some are going after the consumer – such as AOL X-Drive -, some are going after online backup, and some are going after web site data. Actually, that article doesn’t even mention Amazon’s S3, for example, which is a huge online repository.

The obvious benefits are basically twofold: ease-of-use and – for most of them – the fact that they manage your data for you in terms of backing it up, replicating it, etc. The biggest drawback here is that you have to be connected to a network to get to your files.


Most customers will still look at cost/Gigabyte as the main motivator to use a service like this and, at the right price and benefit point, people will put their files online. Since all these storage service providers all buy their disks from the same small number of companies that actually make disk drives, the costs are all roughly the same for the physical infrastructure needed to build an online storage service and compete.

I think that the real solution here is that, for anyone to breakthrough and get some separation from the crowd, they are going to have to incorporate breakthrough storage optimization in their offering – and do so in a way that’s transparent to the end user. That could be dedupe, that could be compression, or it could be something more sophisticated like Ocarina. The main thing is that if you can get 5:1 or 10:1 ratios on how much logical space you can provide via the cloud to how much physical space you, as a provider, have to buy, then you can have a compelling proposition. The competition is fierce in this market and in order to grow and thrive in any business that offers online storage, the providers are going to have to develop a strategy to significantly increase their online storage capacity without increasing cost and overhead in step.

Who’s Really Melting the Ice Cap?

Posted in Featured,Storage by storageoptimization on May 21, 2008
Jon William Toigo’s blog “Drunken Data,” which has fun with its headlines, has a post titled “Climate Change or Silly Season?” In it, he references an article that ran in Macworld UK stating that Apple Computer–that darling of uberyuppies and designers–has been rated as a contributor to global warming.
Credit Flickr User Tom\'s Caps
Toigo’s response: “While I agree that the company generates a lot of hot air, the truth is that storage hardware, not PCs/MACs/servers, is the big power pig. Behind it all is a total mismanagement of data. Think about naming your files better and deploying archive technology the next time you see that video of a chunk of ice breaking off from a glacier.”
In truth, servers (and computers in general) give off a lot more heat per unit of rack space than storage. Processors that are running full out generate a lot of heat, and consume a lot of power. At the same time, both individuals and corporations have a high ratio of storage to servers, so if you add it all up, it might be the case that a data center uses as much power for storage as it does for servers.
That being said, I don’t think “naming your files better” is going to turn out to be the answer. Some combination of thin provisioning (waste less free space) and storage optimization (store things efficiently, along the lines that virtual machines use CPU efficiently) is the direction that things are headed.
The key thing to keep in mind is: are the servers and storage being used efficiently? In the server arena, virtualization has turned out to be the magic answer – allowing data centers to consolidate multiple logical servers on to one physical one to make sure each physical server is being used efficiently, and that a lot of idle servers aren’t wasting power, rackspace and cooling.
In short, I think that “storage optimization” is to making storage more efficient what “server virtualization” was to making servers more efficient.

Information Week on Storage Optimization

Posted in Uncategorized by storageoptimization on May 19, 2008

Last week, Storage Switzerland Analyst George Crump penned a guest column for Information Week about the optimization of primary storage, a topic close to our hearts.

The article looks at the various approaches to online storage optimization and how these solutions can help reduce the footprint of online data and help companies effectively respond to the massive increase in information that lives and is accessed via the Internet.

George is working on a series of articles looking at vendors in the space, for the first installment on Ocarina Networks click here.

Disclosure, I am a co-founder of Ocarina.

The New Storage Cost Metric: Petabytes/Admin

Posted in HP,Storage by storageoptimization on May 13, 2008

Some Thoughts on HP’s New Massively Scalable NAS

Last week HP announced a new massively scalable NAS solution they called Extreme Storage (ExDS)

As Judy Mottl at InternetNews.com notes, HP’s offering is interesting in several ways, including a very aggressive list price of under $2 per Gigabyte, and the fact that performance and amount of storage scale separately.
One of the most interesting things about ExDS, though, is that HP is talking about a metric for measuring storage cost that I haven’t seen much discussion of to date – Petabytes/admin.
In other words, HP is putting a focus on making it easy to deploy and manage a huge amount of storage with one person.

Now, in some ways, this is just a twist on ease of use, but ease of use at scale is different than ease of use on a small filer. For example, Network Appliance has always done a great job of making it easy to configure and deploy a filer. However, making a filer with 5 Terabytes easy to deploy doesn’t make it easy to deploy and manage 10 Petabytes of storage – which is the situation a lot of customers are finding themselves in – especially in key growth areas like web 2.0, social media, online email, and rich media.
If you have to deploy 50 or 100 filers to get to scale, the fact that each one was easy pales in comparison to having to do it 100 times, and then monitor and manage 100 standalone puddles of storage.
At the rate that unstructured storage is growing managing massive amounts of storage with a small number of storage admins is going to be increasingly important.

It looks like HP has done a good job of this – simplifying the whole stack for the storage admin, from the lowest hardware level all the way up through provisioning, deployment, and what users see. If so, then the metric Petabytes/Admin will become one of the most important metrics in comparing scalable storage solutions. Of course, when you add storage optimization to the mix, the $/Gigabyte go down and the Petabytes/Admin go up.
As Mary Jander at Byte and Switch comments, with EMC’s forthcoming Hulk/Maui combination, and IBM’s purchase of scalable storage solution XIV, massively scalable storage is shaping up to be a major battlefront. As that battle takes shape, look for the P/A ratio to be a key measure to watch. Kudo’s to HP for bringing it to the forefront.

Finding us just got easier

Posted in Storage by storageoptimization on May 7, 2008
Tags: , ,

It’s always hard to know how to find the best and more interesting blogs. When it comes to those that–like this one–are tech focused, the possibilities are nearly endless. That’s why we’re glad to announce that Storage Optimization is now part of FindTechBlogs, a blog aggregator that provides an easy way to browse for quality tech blog content.

Several of this blog’s recent posts are featured on the front page of the site, and we’re also listed on the blogroll. We join such as Kevin Epstein at the Scalent Systems – Next Generation Data Center Virtualization blog and Ken Oestreich’s Fountainhead blog, which has some posts of relevance this week about the Uptime Institute’s Green Enterprise Computing Symposium. Here’s hoping this is one more way that we can join the community of bloggers that are talking about what matters in the IT and storage worlds.

Can You Compress Already Compressed Files? Part II

Posted in Featured,File Systems,Storage by storageoptimization on May 6, 2008
Tags:

In my last post I discussed the fact that most files that are used are already compressed. And up to now, there were no algorithms to further compress them. Yet, it’s obvious that there needs to be a new solution.

On the cutting edge, there are some new innovations in file-aware optimization that allow companies to reduce their storage footprint and get more from the storage they already have. The key to this is understanding specific file types, their formats, and how the applications that created those files use and save data. Most existing compression tools are generic. To get better results than you can get with a generic compressor, you need to go to file-type-aware compressors.

There’s another problem. Let’s say you just created a way better tool for compressing photographs than JPEG. That doesn’t mean your tool can compress already-compressed JPEGs, it means that if you were given the same original photo in the first place, you could do a better job. So the first step in moving towards compressing already-compressed files is what we call Extraction – you have to extract the original full information from the file. In most cases, that’s going to involve de-compressing the file first, getting back to the uncompressed original, and then applying your better tools.

Extraction may seem simple enough – just reverse whatever was done to a file in the first place. But it’s not always quite that easy. Many files are compound documents, with multiple sections or objects of different data types. A PowerPoint presentation, for example, may have text sections, graphics sections, some photos pasted in, etc. The same is true for PDFs, email folders with attachments, and a lot of the other file types that are driving storage growth. So to really extract all the original information from these files, you may need to not only be able to decompress files, but to look inside them, understand how they are structured, break them apart in to their separate pieces, and then do different things to each different piece.

The two things to take away from this discussion are: 1) you won’t get much benefit from applying generic compression to already-compressed file types, which are the file types that are driving most of your storage growth and 2) it is possible to compress already-compressed files, but to do so, you have to first extract all the original information from them, which may involve decoding and unraveling complex compound documents and then decompressing all the different parts. Once you’ve gotten to that point, you’re just at the starting point for where online data reduction can really get started for today’s file types.

Can you compress an already compressed file? Part I

Posted in Featured,File Systems,Storage by storageoptimization on May 1, 2008
Tags:

We can all recognize the amount of data we generate. And just like we keep telling ourselves we’ll clean out the garage “one of these days” most of us rarely bother to clean out our email or photo sharing accounts.

As a result, enterprise and internet data centers have to buy hundreds of thousands of petabytes of disk every year to handle all the data in those files. It all has to be stored somewhere.

One way to reduce the amount of storage growth is to compress files. Compression techniques have been around forever, and are built in to many operating systems (like Windows) and storage platforms (such as file servers).

Here’s the problem: most modern file formats, the formats driving all this storage growth, are already compressed.
· The most common format for photos is JPEG – that’s a compressed image format.
· The most common format for most documents at work is Microsoft Office, and in Office 2007, all Office documents are compressed as they are saved.
· Music (mp3) and video (MPEG-2 and MPEG-4) are highly compressed.

The mathematics of compression are that once you compress a file, and reduce its size, you can’t expect to be able to compress it again and get even more size reduction. The way compression works is that it looks for patterns in the data, and if it finds patterns it replaces them with more efficient codes. So if you’ve compressed something once, the compressed file shouldn’t have any patterns in it.

Of course, some compression algorithms are better than others, and you might see some small benefits by trying to compress something that has already been compressed with a lesser tool, but for the most part, you’re not going to see a big win by doing that. In fact, in a lot of cases, trying to compress an already compressed file will make it bigger!
Conventional wisdom dictates that once files are compressed via commonly used technologies, the ability to further limit their size and consumption of expensive resources is nearly impossible. So, what can be done about this?

Greening storage

Posted in File Systems,Storage by storageoptimization on May 1, 2008
Tags:

The New York Times Bits blog has a post on the need to green Internet and other data centers, “Data Centers are Becoming Big Polluters.” Citing a study by McKinsey & Company, Bits’ Steve Lohr states that data centers are “projected to surpass the airline industry as a greenhouse gas polluter by 2020.”

He goes on to sum up the report, which “also lists 10 ‘game-changing improvements’ intended to double data center efficiency, ranging from using virtualization software to integrated control of cooling units.”

Many of us are aware that server virtualization is the path to increasing server utilization. But servers are only half of the data center picture. The other half is storage. The solution for that? Storage optimization.

Just as server virtualization lets you turn 10 physical servers in to 10 virtual servers and then consolidate them on to one physical machine, storage optimization lets you store 10 times more files on a given disk than you can today. The heat, cooling, rackspace, and power benefits are obvious.

Update: Ben Worthen at the Wall Street Journal is also discussing this on the Business Technology Blog. His post, “Can the Tech Guy Afford to Care about Pollution?” also talks about how the problem will only get worse in the future. Worthen’s take: “Given that most of the tech departments we talk to are looking to cut costs, they’re not likely to invest in new technology that will cut emissions, unless it cuts short-term costs at the same time.”