Less is More
Less is more … or is it? Part One
I recently returned from Storage Networking World in Orlando. As everyone knows, the conference is mainly a place for storage vendors to meet each other, tout their wares, and nose around in their competitors’ booths pretending to be potential customers. There are some good sessions, however, and one of the best was IDC analyst Noemi Greyzdorf’s presentation on the future of file systems.
Her smart and interesting talk was on the evolution of clustered, distributed, and grid file systems. As I listened, it occurred to me that I’m seeing a big split in the file system world, especially at the high end, where really large amounts of data are stored.
One of Noemi’s key points is that more and more functionality is being packed into file systems. As she puts it, file systems are the natural place for value-add knowledge about storage to be kept. That’s certainly true, and there are a number of advanced file systems that are becoming richer and richer in terms of integrated features.
At the same time, there is definitely a “less is more” crowd emerging, where many of the most basic features of file systems are being left out in some of the newest large-scale file systems around. This group includes file systems like GoogleFS, Hadoop, Mogile, Amazon’s S3 simple storage service, and the in-house developments at a couple of other very large online web 2.0 shops.
Are these two trends in file systems headed on a collision course? I don’t think so. But what I do see is that neither of these solutions is nailing the growing problem posed by the exploding amount of internet data that needs to be managed and stored. In other words, there are issues with both of these approaches. In my next entry, I will discuss what that is, and how we might solve it.
on April 30, 2008 on 4:56 am
Wouldn’t it be best for enterprises to store their internet data on virtual storage devices and implement an archiving system to sift non-essential data. This will also help with making backups more efficient.
on May 5, 2008 on 4:31 pm
Yes, sifting data and getting files to the right tier of storage is key. Archive may be one of those tiers. But all the tiers will have file systems - whether they are internet services, archive products, or just low-cost storage tiers.
One of the most important things is how you get files to the right tier without disrupting users or applications. A good approach for that is a file virtualization layer like F5’s Acopia product. But however you move files around, archive them, or optimize them (a la Ocarina) you still need a file system to store them on.