Compression-accelerated BLAST and BLAT

 

August 2014 CAST v2.2 (currently under repair)

By Andrew Haskell, Noah Daniels, and Bonnie Berger

This version of CaBLAST is currently under repair. Stay tuned.

A new release of CaBLAST and CaBLAT is now available, which provides a much more readily usable program that boosts the existing nucleotide BLAST and BLAT programs using a plugin architecture, and is able to be passed in BLAST and BLAT arguments.

A version of CaBLAST that works with proteins can be found here.


Illustration by Steven H. Lee. Thanks also to Leslie Gaffney, Broad Institute.

The past two decades have seen an exponential increase in sequencing capabilities, outstripping advances in computing power. Extracting new insights from the data sets currently being generated will require not only faster computers; it will require smarter algorithms. However, most genomes currently sequenced are highly similar to ones already collected; thus, the amount of novel sequence information is growing much more slowly.

We show that this redundancy can be exploited by compressing data in a way that allows direct computation on the compressed data. This approach reduces the computational task of operating on many highly similar genomes to only slightly more than that of operating on just one. We demonstrate this compressive architecture by implementing accelerated versions of both BLAST and BLAT, and emphasize how compressive genomics, more generally, will enable biologists to keep pace with current data.

Source Code

We have implemented two prototype algorithms that demonstrate the compressive genomics paradigm: Compression-accelerated BLAST (CaBLAST) and Compression-accelerated BLAT (CaBLAT). These algorithms serve as proof-of-concept that computationally-aware compression not only reduces storage space but also accelerates analysis (in this case, sequence search).

Our source code can be downloaded here for academic and non-profit use:

For a detailed description of the algorithms and discussion of relevant implementation trade-offs, please see the Supplementary Methods of our article "Compressive genomics" in Nature Biotechnology, July 2012.

Contact

We welcome feedback, questions and suggestions. Contact information is available at the authors' websites: Po-Ru Loh, Michael Baym, Bonnie Berger.

Referencing CaBLAST/CaBLAT

If you use CaBLAST or CaBLAT, please reference the following: