Metagenome Assembly: Explanation, Challenges and Future Trends

سال انتشار: 1396
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 363

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS07_154

تاریخ نمایه سازی: 29 فروردین 1397

چکیده مقاله:

For many years, retrieving genomic sequence of undiscovered species was a complicated task, mainly because sequencing of the DNA as a whole was not possible. Therefore, original sequences of the genome had to be assembled de novo from huge number of overlapping small reads from different copies of the genome. When working on a microbiome sample, many of the including species can’t be cultured in the laboratory and genomic fragments of all species must be read and assembled later without knowing the origin of each fragment. The set of all these reads is called a metagenome. The above-mentioned circumstances make metagenome assembly even harder than genome assembly. The initial assembly of both genomic and metagenomic data is based on graph algorithms, specially those using De Bruijn graphs. In this paper, we will introduce different stages of metagenome assembly, the algorithms and time complexity of each stage and the influence of each technique on the final result of the assembly. Various challenges are encountered in this process, such as detection and correction of sequencing errors, grouping reads of each genome, finding shared reads and repeated regions, resolving differences between strains and time/memory complexity of the algorithms to examine feasibility of running them on big data. We will list significant metagenome assembly tools and as an example will briefly introduce metaSPAdes [1] (an extended version of SPAdes assembler [2] for metagenomic data). Finally, we will mention new trends and promising approaches in sequencing and assembly of both genomes and metagenomes which can alleviate current difficulties and have revolutionary improvements in length and accuracy of assembled sequences.

کلیدواژه ها:

نویسندگان

S Momken

Department of Algorithms and Computation, University of Tehran, Tehran, Postal code ۱۴۱۷۴۶۶۱۹۱, Iran

K Kavousi

Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Postal code ۱۴۱۷۶۱۴۴۱۱, Iran

A Banaei-Moghaddam

Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Postal code ۱۴۱۷۶۱۴۴۱۱, Iran

D Moazzami

Department of Algorithms and Computation, University of Tehran, Tehran, Postal code ۱۴۱۷۴۶۶۱۹۱, Iran