Coracle : a Machine Learning Framework to Identify Bacteria Associated with Continuous Variables

dc.contributor.authorStaab, Sebastian
dc.contributor.authorCárdenas, Anny
dc.contributor.authorPeixoto, Raquel S.
dc.contributor.authorSchreiber, Falk
dc.contributor.authorVoolstra, Christian R.
dc.date.accessioned2024-01-04T13:17:46Z
dc.date.available2024-01-04T13:17:46Z
dc.date.issued2024
dc.description.abstractWe present Coracle, an Artificial Intelligence (AI) framework that can identify associations between bacterial communities and continuous variables. Coracle uses an ensemble approach of prominent feature selection methods and machine learning (ML) models to identify features, i.e., bacteria, associated with a continuous variable, e.g. host thermal tolerance. The results are aggregated into a score that incorporates the performances of the different ML models and the respective feature importance, while also considering the robustness of feature selection. Additionally, regression coefficients provide first insights into the direction of the association. We show the utility of Coracle by analyzing associations between bacterial composition data (i.e., 16S rRNA Amplicon Sequence Variants, ASVs) and coral thermal tolerance (i.e., standardized short-term heat stress-derived diagnostics). This analysis identified high-scoring bacterial taxa that were previously found associated with coral thermal tolerance. Coracle scales with feature number and performs well with hundreds to thousands of features, corresponding to the typical size of current datasets. Coracle performs best if run at a higher taxonomic level first (e.g., order or family) to identify groups of interest that can subsequently be run at the ASV level.
dc.description.versionpublisheddeu
dc.identifier.doi10.1093/bioinformatics/btad749
dc.identifier.ppn1880420597
dc.identifier.urihttps://kops.uni-konstanz.de/handle/123456789/68931
dc.language.isoeng
dc.rightsAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subject.ddc570
dc.titleCoracle : a Machine Learning Framework to Identify Bacteria Associated with Continuous Variableseng
dc.typeJOURNAL_ARTICLE
dspace.entity.typePublication
kops.citation.bibtex
@article{Staab2024Corac-68931,
  title={Coracle : a Machine Learning Framework to Identify Bacteria Associated with Continuous Variables},
  year={2024},
  doi={10.1093/bioinformatics/btad749},
  number={1},
  volume={40},
  issn={1367-4803},
  journal={Bioinformatics},
  author={Staab, Sebastian and Cárdenas, Anny and Peixoto, Raquel S. and Schreiber, Falk and Voolstra, Christian R.},
  note={Article Number: btad749}
}
kops.citation.iso690STAAB, Sebastian, Anny CÁRDENAS, Raquel S. PEIXOTO, Falk SCHREIBER, Christian R. VOOLSTRA, 2024. Coracle : a Machine Learning Framework to Identify Bacteria Associated with Continuous Variables. In: Bioinformatics. Oxford University Press (OUP). 2024, 40(1), btad749. ISSN 1367-4803. eISSN 1367-4811. Verfügbar unter: doi: 10.1093/bioinformatics/btad749deu
kops.citation.iso690STAAB, Sebastian, Anny CÁRDENAS, Raquel S. PEIXOTO, Falk SCHREIBER, Christian R. VOOLSTRA, 2024. Coracle : a Machine Learning Framework to Identify Bacteria Associated with Continuous Variables. In: Bioinformatics. Oxford University Press (OUP). 2024, 40(1), btad749. ISSN 1367-4803. eISSN 1367-4811. Available under: doi: 10.1093/bioinformatics/btad749eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/68931">
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:abstract>We present Coracle, an Artificial Intelligence (AI) framework that can identify associations between bacterial communities and continuous variables. Coracle uses an ensemble approach of prominent feature selection methods and machine learning (ML) models to identify features, i.e., bacteria, associated with a continuous variable, e.g. host thermal tolerance. The results are aggregated into a score that incorporates the performances of the different ML models and the respective feature importance, while also considering the robustness of feature selection. Additionally, regression coefficients provide first insights into the direction of the association. We show the utility of Coracle by analyzing associations between bacterial composition data (i.e., 16S rRNA Amplicon Sequence Variants, ASVs) and coral thermal tolerance (i.e., standardized short-term heat stress-derived diagnostics). This analysis identified high-scoring bacterial taxa that were previously found associated with coral thermal tolerance. Coracle scales with feature number and performs well with hundreds to thousands of features, corresponding to the typical size of current datasets. Coracle performs best if run at a higher taxonomic level first (e.g., order or family) to identify groups of interest that can subsequently be run at the ASV level.</dcterms:abstract>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/68931"/>
    <dcterms:issued>2024</dcterms:issued>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Voolstra, Christian R.</dc:contributor>
    <dcterms:title>Coracle : a Machine Learning Framework to Identify Bacteria Associated with Continuous Variables</dcterms:title>
    <dc:creator>Staab, Sebastian</dc:creator>
    <dc:creator>Peixoto, Raquel S.</dc:creator>
    <dc:contributor>Staab, Sebastian</dc:contributor>
    <dc:rights>Attribution 4.0 International</dc:rights>
    <dc:contributor>Peixoto, Raquel S.</dc:contributor>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2024-01-04T13:17:46Z</dcterms:available>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2024-01-04T13:17:46Z</dc:date>
    <dc:contributor>Schreiber, Falk</dc:contributor>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:creator>Cárdenas, Anny</dc:creator>
    <dc:language>eng</dc:language>
    <dc:creator>Voolstra, Christian R.</dc:creator>
    <dc:creator>Schreiber, Falk</dc:creator>
    <dc:contributor>Cárdenas, Anny</dc:contributor>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/68931/1/staab_2-1qjmi1jk397fx9.PDF"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/68931/1/staab_2-1qjmi1jk397fx9.PDF"/>
  </rdf:Description>
</rdf:RDF>
kops.description.openAccessopenaccessgold
kops.flag.isPeerReviewedtrue
kops.flag.knbibliographytrue
kops.identifier.nbnurn:nbn:de:bsz:352-2-1qjmi1jk397fx9
kops.sourcefieldBioinformatics. Oxford University Press (OUP). 2024, <b>40</b>(1), btad749. ISSN 1367-4803. eISSN 1367-4811. Verfügbar unter: doi: 10.1093/bioinformatics/btad749deu
kops.sourcefield.plainBioinformatics. Oxford University Press (OUP). 2024, 40(1), btad749. ISSN 1367-4803. eISSN 1367-4811. Verfügbar unter: doi: 10.1093/bioinformatics/btad749deu
kops.sourcefield.plainBioinformatics. Oxford University Press (OUP). 2024, 40(1), btad749. ISSN 1367-4803. eISSN 1367-4811. Available under: doi: 10.1093/bioinformatics/btad749eng
relation.isAuthorOfPublicationd6d85792-f9ac-4dbd-9415-c357155e6559
relation.isAuthorOfPublication7d03372f-8a46-44c6-8856-a2e393b3d686
relation.isAuthorOfPublication4a62a6c5-bf37-4efa-a633-4229ff88ed2e
relation.isAuthorOfPublicationc823a9b7-bc09-4520-a440-fba07afeb703
relation.isAuthorOfPublication.latestForDiscovery7d03372f-8a46-44c6-8856-a2e393b3d686
source.bibliographicInfo.articleNumberbtad749
source.bibliographicInfo.issue1
source.bibliographicInfo.volume40
source.identifier.eissn1367-4811
source.identifier.issn1367-4803
source.periodicalTitleBioinformatics
source.publisherOxford University Press (OUP)

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
staab_2-1qjmi1jk397fx9.PDF
Größe:
621.39 KB
Format:
Adobe Portable Document Format
staab_2-1qjmi1jk397fx9.PDF
staab_2-1qjmi1jk397fx9.PDFGröße: 621.39 KBDownloads: 30