Storage Optimization

Why Storage Is So Inefficient: The Huge Gulf Between Applications Development and Storage Platforms

Posted in Featured by storageoptimization on April 14, 2008

Most of what is driving storage growth is files created by applications. The big applications are email, Microsoft Office and office files like PDFs, and rich media files like photos, music, and videos.  There’s a lot of inefficiency in how all the data in these application files get stored.

If you stop and think about it, there’s a simple explanation for this. Applications are written by developers. These folks are trying to solve an application problem, not a storage problem.

Application developers are working with logical files – you have a file name, you read and write bytes at the start, the middle, or the end of those files. They are not thinking, “OK, this is going to go at sector x on cylinder y on platter z on some disk drive.” In their minds, that’s the job of the storage system.

On the other side of this, you have the storage developers. They make systems to store files. But they don’t know what’s in the files, how the files are being used, or even what the data in them is for.

If you are a file system vendor, or a file server or NAS vendor, you create a storage solution where applications can write files, and you figure out how to lay out and organize those files on volumes and disks – with RAID levels, mirroring, snapshots, and all sorts of other cool storage features.  But you don’t know – or care – what is inside the files.  That’s up to the application.

So, as you can see there’s a gulf here. You could see it as a problem, or you could see the way we do: an opportunity. There is a clear need for an improved solution.
If you were a storage expert, and you did look inside each file, and understood how that file data was being laid out, and why, and how it was being used, you could probably figure out much more efficient ways of storing it.

In the old days, this would have been too much work to contemplate – most applications were custom, and every file format was different, and no one could have kept up or figured it all out.   That’s not true anymore.  Today, the vast majority of the world’s file data are in about two dozen fundamental file formats, many of which we already listed – Word, Excel, JPEG, MPEG, PDF, PowerPoint, mp3 and maybe 20 others.   It’s no longer an insurmountable task to figure out how to optimize most of what a data center has to store. That’s true for an internet data center and it’s true for a corporate data center too.

In other words, there is a ton of efficiency to be gained from bridging the gap between how applications write data and how storage stores data. You can get a huge amount of space savings just by dealing with the top 25 file types.  You don’t have to get them all.    If I can drastically improve the space taken by 80% of all your files, that’s still a big win, even I never do figure out the other 20%.

At Ocarina, our ECOsystem (Extract, Correlate, Optimize) starts out by identifying each file by type, understanding what’s inside it, and then taking a set of steps to store every bit of information in those files, but doing so with using a much smaller amount of disk space for each file.

Storage optimization for online storage is going to be about being file type aware.   Without bridging that gap between traditional storage technology and how the application sees its data, online storage optimization won’t get any further than what’s been achieved in the past by generic compression or dedupe.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: