Publikation:

BioBricks.ai : a versioned data registry for life sciences data assets

Lade...
Vorschaubild

Dateien

Gao_2-ur83s3b3ig1u7.pdf
Gao_2-ur83s3b3ig1u7.pdfGröße: 721.57 KBDownloads: 8

Datum

2025

Autor:innen

Gao, Yifan
Mughal, Zakariyya
Jaramillo-Villegas, Jose A.
Corradi, Marie
Borrel, Alexandre
Lieberman, Ben
Sharif, Suliman
Shaffer, John
Fecho, Karamarie
Chatrath, Ajay

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

item.preview.dc.identifier.eissn

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

item.preview.dc.identifier.arxiv

Internationale Patentnummer

Link zur Lizenz

Angaben zur Forschungsförderung

U.S. National Science Foundation (NSF): 2333728
U.S. National Science Foundation (NSF): SBIR 2012214
European Union (EU): 963845

Projekt

Open Access-Veröffentlichung
Open Access Gold
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Zeitschriftenartikel
Publikationsstatus
Published

Erschienen in

Frontiers in Artificial Intelligence. Frontiers. 2025, 8, 1599412. eISSN 2624-8212. Verfügbar unter: doi: 10.3389/frai.2025.1599412

Zusammenfassung

Introduction: Researchers in biomedicine and public health often spend weeks locating, cleansing, and integrating data from disparate sources before analysis can begin. This redundancy slows discovery and leads to inconsistent pipelines.

Methods: We created BioBricks.ai, an open, centralized repository that packages public biological and chemical datasets as modular “bricks.” Each brick is a Data Version Control (DVC) Git repository containing an extract‑transform‑load (ETL) pipeline. A package‑manager–like interface handles installation, dependency resolution, and updates, while data are delivered through a unified backend (https://biobricks.ai).

Results: The current release provides >90 curated datasets spanning genomics, proteomics, cheminformatics, and epidemiology. Bricks can be combined programmatically to build composite resources; benchmark use‑cases show that assembling multi‑dataset analytic cohorts is reduced from days to minutes compared with bespoke scripts.

Discussion: BioBricks.ai accelerates data access, promotes reproducible workflows, and lowers the barrier for integrating heterogeneous public datasets. By treating data as version‑controlled software, the platform encourages community contributions and reduces redundant engineering effort. Continued expansion of brick coverage and automated provenance tracking will further enhance FAIR (Findable, Accessible, Interoperable, Reusable) data practices across the life‑science community.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
570 Biowissenschaften, Biologie

Schlagwörter

public health data, BioBricks.ai, data integration, machine learning, cheminformatics, bioinformatics

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690GAO, Yifan, Zakariyya MUGHAL, Jose A. JARAMILLO-VILLEGAS, Marie CORRADI, Alexandre BORREL, Ben LIEBERMAN, Suliman SHARIF, John SHAFFER, Karamarie FECHO, Ajay CHATRATH, Alexandra MAERTENS, Marc A. T. TEUNIS, Nicole KLEINSTREUER, Thomas HARTUNG, Thomas LUECHTEFELD, 2025. BioBricks.ai : a versioned data registry for life sciences data assets. In: Frontiers in Artificial Intelligence. Frontiers. 2025, 8, 1599412. eISSN 2624-8212. Verfügbar unter: doi: 10.3389/frai.2025.1599412
BibTex
@article{Gao2025-08-13BioBr-76427,
  title={BioBricks.ai : a versioned data registry for life sciences data assets},
  year={2025},
  doi={10.3389/frai.2025.1599412},
  volume={8},
  journal={Frontiers in Artificial Intelligence},
  author={Gao, Yifan and Mughal, Zakariyya and Jaramillo-Villegas, Jose A. and Corradi, Marie and Borrel, Alexandre and Lieberman, Ben and Sharif, Suliman and Shaffer, John and Fecho, Karamarie and Chatrath, Ajay and Maertens, Alexandra and Teunis, Marc A. T. and Kleinstreuer, Nicole and Hartung, Thomas and Luechtefeld, Thomas},
  note={Article Number: 1599412}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/76427">
    <dc:creator>Hartung, Thomas</dc:creator>
    <dc:contributor>Corradi, Marie</dc:contributor>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/76427/1/Gao_2-ur83s3b3ig1u7.pdf"/>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/76427/1/Gao_2-ur83s3b3ig1u7.pdf"/>
    <dc:contributor>Kleinstreuer, Nicole</dc:contributor>
    <dcterms:issued>2025-08-13</dcterms:issued>
    <dc:contributor>Teunis, Marc A. T.</dc:contributor>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
    <dc:creator>Teunis, Marc A. T.</dc:creator>
    <dc:contributor>Hartung, Thomas</dc:contributor>
    <dc:creator>Fecho, Karamarie</dc:creator>
    <dc:creator>Gao, Yifan</dc:creator>
    <dc:contributor>Borrel, Alexandre</dc:contributor>
    <dc:rights>Attribution 4.0 International</dc:rights>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/>
    <dc:contributor>Maertens, Alexandra</dc:contributor>
    <dc:creator>Corradi, Marie</dc:creator>
    <dc:contributor>Mughal, Zakariyya</dc:contributor>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:creator>Chatrath, Ajay</dc:creator>
    <dc:creator>Maertens, Alexandra</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2026-03-03T08:58:56Z</dcterms:available>
    <dc:creator>Kleinstreuer, Nicole</dc:creator>
    <dc:creator>Jaramillo-Villegas, Jose A.</dc:creator>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/76427"/>
    <dc:contributor>Shaffer, John</dc:contributor>
    <dc:creator>Lieberman, Ben</dc:creator>
    <dc:contributor>Lieberman, Ben</dc:contributor>
    <dcterms:title>BioBricks.ai : a versioned data registry for life sciences data assets</dcterms:title>
    <dc:creator>Mughal, Zakariyya</dc:creator>
    <dc:contributor>Luechtefeld, Thomas</dc:contributor>
    <dc:contributor>Sharif, Suliman</dc:contributor>
    <dcterms:abstract>Introduction: Researchers in biomedicine and public health often spend weeks locating, cleansing, and integrating data from disparate sources before analysis can begin. This redundancy slows discovery and leads to inconsistent pipelines.  

Methods: We created BioBricks.ai, an open, centralized repository that packages public biological and chemical datasets as modular “bricks.” Each brick is a Data Version Control (DVC) Git repository containing an extract‑transform‑load (ETL) pipeline. A package‑manager–like interface handles installation, dependency resolution, and updates, while data are delivered through a unified backend (https://biobricks.ai).  

Results: The current release provides &gt;90 curated datasets spanning genomics, proteomics, cheminformatics, and epidemiology. Bricks can be combined programmatically to build composite resources; benchmark use‑cases show that assembling multi‑dataset analytic cohorts is reduced from days to minutes compared with bespoke scripts.  

Discussion: BioBricks.ai accelerates data access, promotes reproducible workflows, and lowers the barrier for integrating heterogeneous public datasets. By treating data as version‑controlled software, the platform encourages community contributions and reduces redundant engineering effort. Continued expansion of brick coverage and automated provenance tracking will further enhance FAIR (Findable, Accessible, Interoperable, Reusable) data practices across the life‑science community.</dcterms:abstract>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/>
    <dc:creator>Sharif, Suliman</dc:creator>
    <dc:contributor>Fecho, Karamarie</dc:contributor>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Gao, Yifan</dc:contributor>
    <dc:contributor>Chatrath, Ajay</dc:contributor>
    <dc:language>eng</dc:language>
    <dc:contributor>Jaramillo-Villegas, Jose A.</dc:contributor>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2026-03-03T08:58:56Z</dc:date>
    <dc:creator>Borrel, Alexandre</dc:creator>
    <dc:creator>Luechtefeld, Thomas</dc:creator>
    <dc:creator>Shaffer, John</dc:creator>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Ja
Diese Publikation teilen
social media icon
social media icon