Publikation:

Accuracy of de novo assembly of DNA sequences from double-digest libraries varies substantially among software

Lade...
Vorschaubild

Dateien

Zu diesem Dokument gibt es keine Dateien.

Datum

2020

Autor:innen

LaCava, Melanie E. F.
Megna, Libby C.
Randolph, Gregg
Hubbard, Charley
Buerkle, C. Alex

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

URI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Zeitschriftenartikel
Publikationsstatus
Published

Erschienen in

Molecular Ecology Resources. Wiley-Blackwell. 2020, 20(2), pp. 360-370. ISSN 1755-098X. eISSN 1755-0998. Available under: doi: 10.1111/1755-0998.13108

Zusammenfassung

Advances in DNA sequencing have made it feasible to gather genomic data for non-model organisms and large sets of individuals, often using methods for sequencing subsets of the genome. Several of these methods sequence DNA associated with endonuclease restriction sites (various RAD and GBS methods). For use in taxa without a reference genome, these methods rely on de novo assembly of fragments in the sequencing library. Many of the software options available for this application were originally developed for other assembly types and we do not know their accuracy for reduced representation libraries. To address this important knowledge gap, we simulated data from the Arabidopsis thaliana and Homo sapiens genomes and compared de novo assemblies by six software programs that are commonly used or promising for this purpose (ABySS, CD-HIT, Stacks, Stacks2, Velvet and VSEARCH). We simulated different mutation rates and types of mutations, and then applied the six assemblers to the simulated data sets, varying assembly parameters. We found substantial variation in software performance across simulations and parameter settings. ABySS failed to recover any true genome fragments, and Velvet and VSEARCH performed poorly for most simulations. Stacks and Stacks2 produced accurate assemblies of simulations containing SNPs, but the addition of insertion and deletion mutations decreased their performance. CD-HIT was the only assembler that consistently recovered a high proportion of true genome fragments. Here, we demonstrate the substantial difference in the accuracy of assemblies from different software programs and the importance of comparing assemblies that result from different parameter settings.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
570 Biowissenschaften, Biologie

Schlagwörter

GBS, genomics, indels, paralogs, population, RAD, reference genome

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690LACAVA, Melanie E. F., Ellen O. AIKENS, Libby C. MEGNA, Gregg RANDOLPH, Charley HUBBARD, C. Alex BUERKLE, 2020. Accuracy of de novo assembly of DNA sequences from double-digest libraries varies substantially among software. In: Molecular Ecology Resources. Wiley-Blackwell. 2020, 20(2), pp. 360-370. ISSN 1755-098X. eISSN 1755-0998. Available under: doi: 10.1111/1755-0998.13108
BibTex
@article{LaCava2020-03Accur-52504,
  year={2020},
  doi={10.1111/1755-0998.13108},
  title={Accuracy of de novo assembly of DNA sequences from double-digest libraries varies substantially among software},
  number={2},
  volume={20},
  issn={1755-098X},
  journal={Molecular Ecology Resources},
  pages={360--370},
  author={LaCava, Melanie E. F. and Aikens, Ellen O. and Megna, Libby C. and Randolph, Gregg and Hubbard, Charley and Buerkle, C. Alex}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/52504">
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-01-19T14:23:16Z</dc:date>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:creator>Aikens, Ellen O.</dc:creator>
    <dc:contributor>Randolph, Gregg</dc:contributor>
    <dc:language>eng</dc:language>
    <dc:creator>Buerkle, C. Alex</dc:creator>
    <dc:creator>Randolph, Gregg</dc:creator>
    <dc:contributor>LaCava, Melanie E. F.</dc:contributor>
    <dcterms:issued>2020-03</dcterms:issued>
    <dc:creator>Megna, Libby C.</dc:creator>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/52504"/>
    <dc:contributor>Hubbard, Charley</dc:contributor>
    <dcterms:title>Accuracy of de novo assembly of DNA sequences from double-digest libraries varies substantially among software</dcterms:title>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-01-19T14:23:16Z</dcterms:available>
    <dc:creator>Hubbard, Charley</dc:creator>
    <dc:contributor>Aikens, Ellen O.</dc:contributor>
    <dc:contributor>Buerkle, C. Alex</dc:contributor>
    <dc:contributor>Megna, Libby C.</dc:contributor>
    <dc:creator>LaCava, Melanie E. F.</dc:creator>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/>
    <dc:rights>terms-of-use</dc:rights>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:abstract xml:lang="eng">Advances in DNA sequencing have made it feasible to gather genomic data for non-model organisms and large sets of individuals, often using methods for sequencing subsets of the genome. Several of these methods sequence DNA associated with endonuclease restriction sites (various RAD and GBS methods). For use in taxa without a reference genome, these methods rely on de novo assembly of fragments in the sequencing library. Many of the software options available for this application were originally developed for other assembly types and we do not know their accuracy for reduced representation libraries. To address this important knowledge gap, we simulated data from the Arabidopsis thaliana and Homo sapiens genomes and compared de novo assemblies by six software programs that are commonly used or promising for this purpose (ABySS, CD-HIT, Stacks, Stacks2, Velvet and VSEARCH). We simulated different mutation rates and types of mutations, and then applied the six assemblers to the simulated data sets, varying assembly parameters. We found substantial variation in software performance across simulations and parameter settings. ABySS failed to recover any true genome fragments, and Velvet and VSEARCH performed poorly for most simulations. Stacks and Stacks2 produced accurate assemblies of simulations containing SNPs, but the addition of insertion and deletion mutations decreased their performance. CD-HIT was the only assembler that consistently recovered a high proportion of true genome fragments. Here, we demonstrate the substantial difference in the accuracy of assemblies from different software programs and the importance of comparing assemblies that result from different parameter settings.</dcterms:abstract>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Ja
Diese Publikation teilen