- Bray, N., Pimentel, H., Melsted, P., Pachter, L.: Near-optimal probabilistic RNA-seq quantification, Nature Biotechnology Vol. 34, pages 525–527, 2016
Abstract: We present kallisto, an RNA-seq quantification program that is two orders of magnitude faster than previous approaches and achieves similar accuracy. Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases. We use kallisto to analyze 30 million unaligned paired-end RNA-seq reads in <10 min on a standard laptop computer. This removes a major computational bottleneck in RNA-seq analysis.
[ DOI ]
- Addresses the problem that the first two steps in typical transcript-level RNA-seq processing workflows are alignment to a transcriptome or a reference genome and estimation of transcript abundances that can be time consuming.
- Comparisons to the widely used program TopHat2 with subsequent quantification with the companion program Cufflinks.
- Describes that sequence data from increasing numbers of samples are gnerated.
- States that the quantification of aligned reads can be sped up with streaming algorithms or by naive counting of reads that
result in a decrease in quantification accuracy.
- Mentions that the direct use of k-mers is inadequate for accurate quantification, but that the hash-based approach provides a basis for speeding up RNA-seq processing.
- Paper investigates whether information from k-mers within reads can be combined to maintain the accuracy of alignment-based quantification.
- Paper examines the central difficulty and key requirement for accurate quantification that is the assignment of reads that cannot be uniquely aligned.
- Paper proposes a method based on pseudoalignment of reads and fragments that focuses only on identifying the transcripts from which the reads could have originated and does not try to pinpoint exactly how the sequences of the reads and transcripts align.