Semantic Scholar Open Access 2017 2382 sitasi

MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets

Martin Steinegger J. Söding

Lihat Sumber DOI

Abstrak

VOLUME 35 NUMBER 11 NOVEMBER 2017 NATURE BIOTECHNOLOGY performance was to combine the doublematch criterion with making k-mers as long as possible, which required finding similar and not just exact k-mers. This effectively bases our decision on up to 2 × 7 = 14 residues instead of just 2 × 3 in BLAST or 12 letters on a size-11 alphabet in DIAMOND. MMseqs2 is parallelized on three levels: time-critical parts are manually vectorized, queries can be distributed to multiple cores, and the target database can be split into chunks distributed to multiple servers. Because MMseqs2 needs no random memory access in its innermost loop, its runtime scales almost inversely with the number of cores used (Supplementary Fig. 2). MMseqs2 requires 13.4 GB plus 7 bytes per amino acid to store the database in memory, MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets

Topik & Kata Kunci

Computer Science Medicine

Penulis (2)

Martin Steinegger

J. Söding

Format Sitasi

APA MLA BibTeX

Steinegger, M., Söding, J. (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. https://doi.org/10.1038/nbt.3988

Akses Cepat

Lihat di Sumber doi.org/10.1038/nbt.3988

Informasi Jurnal

Tahun Terbit: 2017
Bahasa: en
Total Sitasi: 2382×
Sumber Database: Semantic Scholar
DOI: 10.1038/nbt.3988
Akses: Open Access ✓