Software

Currently, this page only includes software I am familiar with. Most of them aim for aligning next-generation sequencing (NGS) data and were developed since 2007. I may extend the list when I have time. Several notes:


Indexing Reads with Hash Tables


Indexing Genome with Hash Tables


Merge Sorting


Indexing Genome with Suffix Array/BWT


Recommendation

First of all, as I am the key developer of two short read aligners (BWA and MAQ), it is really hard for me to give an unbiased evaluation. Please bear this fact in mind when reading through my comments below.

For Illumina reads, I would recommend my program BWA. BWA implements most of the major features of a practical aligner. It is relatively small in memory and highly efficient with little tradeoff on accuracy. BWA outputs alignment in the SAM format. Users may use SAMtools to sort/merge alignments and to make variants calls. One potential concern about BWA is it has not been widely used at the moment. It may be less robust than those publication-proved aligners such as Eland and MAQ.

[Update: With the help of paired-end reads, MAQ is able to find some SNPs at the edge of highly repetitive regions. However, BWA cannot. Nonetheless, I still prefer BWA given its speed and the fact that SNPs that can be called from repeats are rare and more likely to be false positives.]

Mapping inconsistent read pairs with NovoAlign is recommended for PET-based structural varition detection where alignment accuracy is the leading factor on reducing false positive calls. NovoAlign is the most accurate aligner to date.