Semantic Scholar Open Access 2017 2382 sitasi

MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets

Martin Steinegger J. Söding

Abstrak

VOLUME 35 NUMBER 11 NOVEMBER 2017 NATURE BIOTECHNOLOGY performance was to combine the doublematch criterion with making k-mers as long as possible, which required finding similar and not just exact k-mers. This effectively bases our decision on up to 2 × 7 = 14 residues instead of just 2 × 3 in BLAST or 12 letters on a size-11 alphabet in DIAMOND. MMseqs2 is parallelized on three levels: time-critical parts are manually vectorized, queries can be distributed to multiple cores, and the target database can be split into chunks distributed to multiple servers. Because MMseqs2 needs no random memory access in its innermost loop, its runtime scales almost inversely with the number of cores used (Supplementary Fig. 2). MMseqs2 requires 13.4 GB plus 7 bytes per amino acid to store the database in memory, MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets

Penulis (2)

M

Martin Steinegger

J

J. Söding

Format Sitasi

Steinegger, M., Söding, J. (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. https://doi.org/10.1038/nbt.3988

Akses Cepat

Lihat di Sumber doi.org/10.1038/nbt.3988
Informasi Jurnal
Tahun Terbit
2017
Bahasa
en
Total Sitasi
2382×
Sumber Database
Semantic Scholar
DOI
10.1038/nbt.3988
Akses
Open Access ✓