Bioinformatics

Sequencing related

Reads alignment for the NextGen sequencing data. Briefly review several some aligners for new-sequencing reads, indicating their advantages, features and potential limitations.
Read alignment/assembly viewer. Briefly review the features of alignment/assembly viewers that are capable of large (over 10GB) alignments.
Being practical aligners. Discuss how a bioinformatic tool can become practical and popular.
Flawed benchmarks. Why are they flawed and how to improve them.
Theoretical PCR duplicate rate (PDF). Derive the formula of the theoretical PCR duplicate rate and the probability of two ends in a read pair having the same coordinates.
Find unique regions. Present a set of programs on calculating the uniqueness of a region and show the fraction of human genome is unique under different threshold.
Mapping uniqueness. Discuss the definition of unique alignment and point out its weakness.
Theory on multi-sample SNP calling and allele frequency estimate (PDF). A simplified version has been implemented in SAMtools.
A practical guide to the human reference genome sequence.

Sequence analysis

FASTA/FASTQ parser in C. Present a small and versatile FASTA/FASTQ parser contained in a single C header file. This parser works with all known FASTA/FASTQ variants and seamlessly adapt to gzipped file and to FASTA/FASTQ.
Multiple alignment programs. Give a brief overview of several multiple alignment programs, comparing their advantages and performance, summarized from publications.

Data processing and data retrieval

Unix commands for Bioinformaticians. Explain several convenient but unfortuantely not well-known Unix commands which may greatly speed up your analyses on biological data.
Connect UCSC MySQL with Perl. Show a Perl script that quickly retrieves various UCSC data in a specified genomic region by connecting to the public UCSC MySQL server.