The capacity for generating genomic data is increasing more rapidly than computing power. The rate at which genomes can be sequenced doubles every four months or so, whereas computing power doubles only every 18 months.
Searching genomic data faster: A data-comprehension algorithm drastically reduces the time it takes to find a particular gene sequence in a database of genomes. The more genomes it is searching, the greater the speedup it affords. It compresses the data in the right way, then the analysis can be done directly on the compressed data and that increases the speed while maintaining the accuracy of the analysis.
Exploiting redundancy: The compression schema exploits the fact that evolution is stingy with good designs as there is a great deal of overlap in the genomes of closely related species, and some overlap even in the genomes of distantly related species. The algorithm mathematically represents the genome of different species or of different individuals within a species - such that the overlapping data is stored only once. A search of multiple genomes can thus concentrate on their differences, saving time. The algorithm will be useful in any application where the central question is given a sequence to what is it similar to? Extending the technique to information on proteins and RNA sequences may pay even bigger dividends. Now the genome of the major organisms including humans have been mapped, the major questions in biology are what gene are active when, and how the proteins they code for interact [http://www.biocompare.com/Life-Science-News/116969-Searching-Genomic-Data-Faster/; http://www.nature.com/nbt/journal/v30/n7/full/nbt.2241.html].
Comment on This Data Unit