FASTQ Format

The FASTQ format is documented in several places, including Wikipedia, the MAQ website and UCR. These pages differ slightly from each other, but they largely describe the same format.

FASTQ format is frequently confused with another similar yet different format, the Illumina read sequence format, which has similar layout but a different scale in the quality string.


FASTQ/FASTQ Parser in C

The C header file kseq.h is a small library for parsing the FASTA/FASTQ format. It is compatiable with all the FASTQ variants to date. As this parser does not interpret quality strings, it also works with the Illumina read sequence format. Important features include:

This parser is based on a generic stream buffer, which complicates the API a bit but makes the parser works with various types of files and even C strings. For ordinary file I/O, you can use KSEQ_INIT(gzFile, gzread) to set the type of file handler and the read() function. Function kseq_init() is used to initialize the parser and kseq_destroy() to destroy it. Function kseq_read() reads one sequence and fills the kseq_t struct which is: where kseq_t::name, kseq_t::comment, kseq_t::seq and kseq_t::qual give the sequence name, comment, sequence and quality, respectively. Other fields are for internal use only. The following shows an example. This file can also be acquired from the complete tar-ball.