Sequencing related
- Reads alignment for the NextGen sequencing data. Briefly review several some aligners for new-sequencing reads, indicating their advantages, features and potential limitations.
- Read alignment/assembly viewer. Briefly review the features of alignment/assembly viewers that are capable of large (over 10GB) alignments.
- Being practical aligners. Discuss how a bioinformatic tool can become practical and popular.
- Flawed benchmarks. Why are they flawed and how to improve them.
- Theoretical PCR duplicate rate (PDF). Derive the formula of the theoretical PCR duplicate rate and the probability of two ends in a read pair having the same coordinates.
- Find unique regions. Present a set of programs on calculating the uniqueness of a region and show the fraction of human genome is unique under different threshold.
- Mapping uniqueness. Discuss the definition of unique alignment and point out its weakness.
- Theory on multi-sample SNP calling and allele frequency estimate (PDF). A simplified version has been implemented in SAMtools.
- A practical guide to the human reference genome sequence.
Sequence analysis
- FASTA/FASTQ parser in C. Present a small and versatile FASTA/FASTQ parser contained in a single C header file. This parser works with all known FASTA/FASTQ variants and seamlessly adapt to gzipped file and to FASTA/FASTQ.
- Multiple alignment programs. Give a brief overview of several multiple alignment programs, comparing their advantages and performance, summarized from publications.
Data processing and data retrieval
- Unix commands for Bioinformaticians. Explain several convenient but unfortuantely not well-known Unix commands which may greatly speed up your analyses on biological data.
- Connect UCSC MySQL with Perl. Show a Perl script that quickly retrieves various UCSC data in a specified genomic region by connecting to the public UCSC MySQL server.