Publikation: Repositories for Taxonomic Data : where We Are and What is Missing
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
URI (zitierfähiger Link)
DOI (zitierfähiger Link)
Internationale Patentnummer
Link zur Lizenz
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Sammlungen
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
Natural history collections are leading successful large-scale projects of specimen digitization (images, metadata, DNA barcodes), thereby transforming taxonomy into a big data science. Yet, little effort has been directed towards safeguarding and subsequently mobilizing the considerable amount of original data generated during the process of naming 15,000-20,000 species every year. From the perspective of alpha-taxonomists, we provide a review of the properties and diversity of taxonomic data, assess their volume and use, and establish criteria for optimizing data repositories. We surveyed 4113 alpha-taxonomic studies in representative journals for 2002, 2010, and 2018, and found an increasing yet comparatively limited use of molecular data in species diagnosis and description. In 2018, of the 2661 papers published in specialized taxonomic journals, molecular data were widely used in mycology (94%), regularly in vertebrates (53%), but rarely in botany (15%) and entomology (10%). Images play an important role in taxonomic research on all taxa, with photographs used in >80% and drawings in 58% of the surveyed papers. The use of omics (high-throughput) approaches or 3D documentation is still rare. Improved archiving strategies for metabarcoding consensus reads, genome and transcriptome assemblies, and chemical and metabolomic data could help to mobilize the wealth of high-throughput data for alpha-taxonomy. Because long-term-ideally perpetual-data storage is of particular importance for taxonomy, energy footprint reduction via less storage-demanding formats is a priority if their information content suffices for the purpose of taxonomic studies. Whereas taxonomic assignments are quasifacts for most biological disciplines, they remain hypotheses pertaining to evolutionary relatedness of individuals for alpha-taxonomy. For this reason, an improved reuse of taxonomic data, including machine-learning-based species identification and delimitation pipelines, requires a cyberspecimen approach-linking data via unique specimen identifiers, and thereby making them findable, accessible, interoperable, and reusable for taxonomic research. This poses both qualitative challenges to adapt the existing infrastructure of data centers to a specimen-centered concept and quantitative challenges to host and connect an estimated $ \le $2 million images produced per year by alpha-taxonomic studies, plus many millions of images from digitization campaigns. Of the 30,000-40,000 taxonomists globally, many are thought to be nonprofessionals, and capturing the data for online storage and reuse therefore requires low-complexity submission workflows and cost-free repository use. Expert taxonomists are the main stakeholders able to identify and formalize the needs of the discipline; their expertise is needed to implement the envisioned virtual collections of cyberspecimens. [Big data; cyberspecimen; new species; omics; repositories; specimen identifier; taxonomy; taxonomic data.].
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
MIRALLES, Aurélien, Teddy BRUY, Katherine WOLCOTT, Mark D. SCHERZ, Dominik BEGEROW, Bank BESZTERI, Michael BONKOWSKI, Janine FELDEN, Birgit GEMEINHOLZER, Frank GLAW, 2020. Repositories for Taxonomic Data : where We Are and What is Missing. In: Systematic biology. Oxford University Press. 2020, 69(6), pp. 1231-1253. ISSN 1063-5157. eISSN 1076-836X. Available under: doi: 10.1093/sysbio/syaa026BibTex
@article{Miralles2020Repos-52226, year={2020}, doi={10.1093/sysbio/syaa026}, title={Repositories for Taxonomic Data : where We Are and What is Missing}, number={6}, volume={69}, issn={1063-5157}, journal={Systematic biology}, pages={1231--1253}, author={Miralles, Aurélien and Bruy, Teddy and Wolcott, Katherine and Scherz, Mark D. and Begerow, Dominik and Beszteri, Bank and Bonkowski, Michael and Felden, Janine and Gemeinholzer, Birgit and Glaw, Frank} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/52226"> <dc:creator>Bruy, Teddy</dc:creator> <dc:creator>Glaw, Frank</dc:creator> <dcterms:title>Repositories for Taxonomic Data : where We Are and What is Missing</dcterms:title> <dc:creator>Gemeinholzer, Birgit</dc:creator> <dc:contributor>Beszteri, Bank</dc:contributor> <dc:creator>Felden, Janine</dc:creator> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/> <dc:creator>Bonkowski, Michael</dc:creator> <dc:contributor>Gemeinholzer, Birgit</dc:contributor> <dc:contributor>Bonkowski, Michael</dc:contributor> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/52226"/> <dc:contributor>Miralles, Aurélien</dc:contributor> <dc:language>eng</dc:language> <dcterms:issued>2020</dcterms:issued> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/> <dcterms:abstract xml:lang="eng">Natural history collections are leading successful large-scale projects of specimen digitization (images, metadata, DNA barcodes), thereby transforming taxonomy into a big data science. Yet, little effort has been directed towards safeguarding and subsequently mobilizing the considerable amount of original data generated during the process of naming 15,000-20,000 species every year. From the perspective of alpha-taxonomists, we provide a review of the properties and diversity of taxonomic data, assess their volume and use, and establish criteria for optimizing data repositories. We surveyed 4113 alpha-taxonomic studies in representative journals for 2002, 2010, and 2018, and found an increasing yet comparatively limited use of molecular data in species diagnosis and description. In 2018, of the 2661 papers published in specialized taxonomic journals, molecular data were widely used in mycology (94%), regularly in vertebrates (53%), but rarely in botany (15%) and entomology (10%). Images play an important role in taxonomic research on all taxa, with photographs used in >80% and drawings in 58% of the surveyed papers. The use of omics (high-throughput) approaches or 3D documentation is still rare. Improved archiving strategies for metabarcoding consensus reads, genome and transcriptome assemblies, and chemical and metabolomic data could help to mobilize the wealth of high-throughput data for alpha-taxonomy. Because long-term-ideally perpetual-data storage is of particular importance for taxonomy, energy footprint reduction via less storage-demanding formats is a priority if their information content suffices for the purpose of taxonomic studies. Whereas taxonomic assignments are quasifacts for most biological disciplines, they remain hypotheses pertaining to evolutionary relatedness of individuals for alpha-taxonomy. For this reason, an improved reuse of taxonomic data, including machine-learning-based species identification and delimitation pipelines, requires a cyberspecimen approach-linking data via unique specimen identifiers, and thereby making them findable, accessible, interoperable, and reusable for taxonomic research. This poses both qualitative challenges to adapt the existing infrastructure of data centers to a specimen-centered concept and quantitative challenges to host and connect an estimated $ \le $2 million images produced per year by alpha-taxonomic studies, plus many millions of images from digitization campaigns. Of the 30,000-40,000 taxonomists globally, many are thought to be nonprofessionals, and capturing the data for online storage and reuse therefore requires low-complexity submission workflows and cost-free repository use. Expert taxonomists are the main stakeholders able to identify and formalize the needs of the discipline; their expertise is needed to implement the envisioned virtual collections of cyberspecimens. [Big data; cyberspecimen; new species; omics; repositories; specimen identifier; taxonomy; taxonomic data.].</dcterms:abstract> <dc:contributor>Bruy, Teddy</dc:contributor> <dc:contributor>Wolcott, Katherine</dc:contributor> <dc:creator>Begerow, Dominik</dc:creator> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-12-22T10:18:47Z</dcterms:available> <dc:contributor>Scherz, Mark D.</dc:contributor> <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/52226/1/Miralles_2-1rtpol2lnq5y9.pdf"/> <dc:contributor>Glaw, Frank</dc:contributor> <dc:creator>Miralles, Aurélien</dc:creator> <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by-nc/4.0/"/> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:creator>Wolcott, Katherine</dc:creator> <dc:contributor>Felden, Janine</dc:contributor> <dc:rights>Attribution-NonCommercial 4.0 International</dc:rights> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-12-22T10:18:47Z</dc:date> <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/52226/1/Miralles_2-1rtpol2lnq5y9.pdf"/> <dc:contributor>Begerow, Dominik</dc:contributor> <dc:creator>Beszteri, Bank</dc:creator> <dc:creator>Scherz, Mark D.</dc:creator> </rdf:Description> </rdf:RDF>