Storage Optimization

Compression – A Matter of Life and Death

Posted in Featured,Storage by storageoptimization on February 9, 2009
Nice piece today in Bioinform about our compression solution for genomics data. Carter George of Ocarina spoke to the author of the piece, Vivien Marx, last week, as did Dave Lifka at Cornell. The article details the work we’re doing with Cornell University’s Center for Advanced Computing (CAC) in partnership with DataDirect to increase their capacity by up to 90 percent.

Gene sequencing has opened up new vistas in medical research that could lead to a completely new era of “personalized medicine,” with targeted treatments and few or no side effects from medications. Momentum for this type of medicine is building–the FDA announced today that it has created a new position dedicated to “coordinating and upgrading” the agency’s involvement in genomics and other elements of personalized medicine.

The potential is huge, and it’s truly horrifying to think that all this progress could be slowed or stopped due to the cost of storage. Thus, freeing up disk space truly can be a matter of life and death.

We’ve addressed this by developing compression solutions specifically designed for sequencing technologies such as those from Illumina and Affymetrix.  The Bioinform article offers significant detail on the types of files we compress as well as the checksums Ocarina performs on each before any shadow files are deleted. We hope you’ll take a look at the piece.

Solving Cornell’s Storage Problems

Posted in Storage by storageoptimization on February 5, 2009
Great news for Ocarina Networks today — we’re working with the Cornell Center for Advanced Computing (CAC) and DataDirect Networks (DDN) to perform extensive data compression testing on a diverse array of research applications. The goal here is address a problem they (and so many other research institutions) are facing — the exponential growth of online data and complex file types that have to be stored in such a way that they’re readily accessible.

For Cornell, one of their biggest pain points was storage of genomics files. Genomics research is accelerating at a dizzying rate, and it turns out to be a very image intensive research area. Here’s one way to think about it: when J. Craig Venter and his team first sequenced the human genome, it took up 2GB of storage. Nowadays, a single molecular sequence can generate 100GB of data per HOUR. That’s not to say that all of it needs to be stored every time, but you get the idea. In fact, all around the world, genome sequencers are spitting out files as they race to find cures for life-threatening diseases such as cancer, Alzheimer’s and heart disease. If they don’t get storage under control, the pace of genomic research could actually be slowed.

For more on our work with Cornell and DataDirect Networks, go here. And if you’re interested in getting the full download on Ocarina’s work with life sciences storage, go to this page for access to a white paper, “Coping with the Explosion of Data in Life Sciences Research.”