Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds.
We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets.
New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm.
The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently.
MicroRNAs (miRNAs) are short abundant non-coding RNAs critical for many cellular processes. Deep sequencing (next-generation sequencing) technologies are being readily used to receive a more accurate depiction of miRNA expression profiles in living cells. This type of analysis is a key step towards improving our understanding of the complexity and mode of miRNA regulation.
Standard DNA alignment programs are inadequate to manage the data produced by new generation DNA sequencers. To answer this problem, we developed PASS with the objective of improving execution time and sensitivity when compared with other available programs. PASS performs fast gapped and ungapped alignments of short DNA sequences onto a reference DNA, typically a genomic sequence. It is designed to handle a huge amount of reads such as those generated by Solexa, SOLiD or 454 technologies.
With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very different error models apply to the novel high-throughput sequencing methods. While the most frequent error-type in Illumina reads are mismatches, reads produced by 454's GS FLX predominantly contain insertions and deletions (indels).
We present a tool suited for searching for many short nucleotide sequences in large databases, allowing for a predefined number of gaps and mismatches. The commandline-driven program implements a non-deterministic automata matching algorithm on a keyword tree of the search strings. Both queries with and without ambiguity codes can be searched. Search time is short for perfect matches, and retrieval time rises exponentially with the number of edits allowed.
The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25-70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP - the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism.
Small RNA sequencing allows genome-wide discovery, categorization, and quantification of genes producing regulatory small RNAs. Many tools have been described for annotation and quantification of microRNA loci (MIRNAs) from small RNA-seq data. However, in many organisms and tissue types, MIRNA genes comprise only a small fraction of all small RNA-producing genes.