The ROC curves

ProgramVersionOptions100k 100bp SE100k 2x100bp PE (CPU sec)
bowtie22.0.0-beta4-X 650; mapQ>178.1154.0 (to be updated)
bwa0.5.9-r26-dev(default); mapQ>0106.5230.1
bwa-sw0.5.9-r26-dev(default); mapQ>0237.4502.0
bwa-sw640.6.0-r79-dev(default); mapQ>0139.4286.5
gsnap2011-10-16(default); mapQ>398.9538.9
novoalign2.05.33-k14 -s3 -i 500 50; mapQ>3359.7349.5
smalt~2011-10-17-k20 -s13 -i 650; mapQ>0468.8640.2

Simulated data

100,000 reads (read pairs) are simulated the human genome with wgsim. In this simulation, we first simulate a diploid genome containing about 28.6 million SNPs and short INDELs and then simulate error free reads from the diploid genome. Although reads are error free, many reads cannot be perfectly mapped to the reference genome due to the presence of variations. The exact wgsim command lines are (the first for single-end and the second for paired-end):


Evaluated programs and command lines

Default configurations are attempted unless the default certainly fails or the documentation explicitly suggests better configurations for 100bp HiSeq reads. Probably the options in use are suboptimal for bowtie-v1 and soap2. The detailed command lines are: Paired-end reads are mapped with the following command lines:

Evaluation

The accuracy of each mapper is evaluated by the wgsim_eval.pl script which regards a mapping is correct if it is within 50bp of the simulated position. The command line is: