BioBricks.ai : a versioned data registry for life sciences data assets

dc.contributor.authorGao, Yifan
dc.contributor.authorMughal, Zakariyya
dc.contributor.authorJaramillo-Villegas, Jose A.
dc.contributor.authorCorradi, Marie
dc.contributor.authorBorrel, Alexandre
dc.contributor.authorLieberman, Ben
dc.contributor.authorSharif, Suliman
dc.contributor.authorShaffer, John
dc.contributor.authorFecho, Karamarie
dc.contributor.authorChatrath, Ajay
dc.contributor.authorMaertens, Alexandra
dc.contributor.authorTeunis, Marc A. T.
dc.contributor.authorKleinstreuer, Nicole
dc.contributor.authorHartung, Thomas
dc.contributor.authorLuechtefeld, Thomas
dc.date.accessioned2026-03-03T08:58:56Z
dc.date.available2026-03-03T08:58:56Z
dc.date.issued2025-08-13
dc.description.abstractIntroduction: Researchers in biomedicine and public health often spend weeks locating, cleansing, and integrating data from disparate sources before analysis can begin. This redundancy slows discovery and leads to inconsistent pipelines. Methods: We created BioBricks.ai, an open, centralized repository that packages public biological and chemical datasets as modular “bricks.” Each brick is a Data Version Control (DVC) Git repository containing an extract‑transform‑load (ETL) pipeline. A package‑manager–like interface handles installation, dependency resolution, and updates, while data are delivered through a unified backend (https://biobricks.ai). Results: The current release provides >90 curated datasets spanning genomics, proteomics, cheminformatics, and epidemiology. Bricks can be combined programmatically to build composite resources; benchmark use‑cases show that assembling multi‑dataset analytic cohorts is reduced from days to minutes compared with bespoke scripts. Discussion: BioBricks.ai accelerates data access, promotes reproducible workflows, and lowers the barrier for integrating heterogeneous public datasets. By treating data as version‑controlled software, the platform encourages community contributions and reduces redundant engineering effort. Continued expansion of brick coverage and automated provenance tracking will further enhance FAIR (Findable, Accessible, Interoperable, Reusable) data practices across the life‑science community.
dc.description.versionpublisheddeu
dc.identifier.doi10.3389/frai.2025.1599412
dc.identifier.ppn1965406424
dc.identifier.urihttps://kops.uni-konstanz.de/handle/123456789/76427
dc.language.isoeng
dc.rightsAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectpublic health data
dc.subjectBioBricks.ai
dc.subjectdata integration
dc.subjectmachine learning
dc.subjectcheminformatics
dc.subjectbioinformatics
dc.subject.ddc570
dc.titleBioBricks.ai : a versioned data registry for life sciences data assetseng
dc.typeJOURNAL_ARTICLE
dspace.entity.typePublication
kops.citation.bibtex
@article{Gao2025-08-13BioBr-76427,
  title={BioBricks.ai : a versioned data registry for life sciences data assets},
  year={2025},
  doi={10.3389/frai.2025.1599412},
  volume={8},
  journal={Frontiers in Artificial Intelligence},
  author={Gao, Yifan and Mughal, Zakariyya and Jaramillo-Villegas, Jose A. and Corradi, Marie and Borrel, Alexandre and Lieberman, Ben and Sharif, Suliman and Shaffer, John and Fecho, Karamarie and Chatrath, Ajay and Maertens, Alexandra and Teunis, Marc A. T. and Kleinstreuer, Nicole and Hartung, Thomas and Luechtefeld, Thomas},
  note={Article Number: 1599412}
}
kops.citation.iso690GAO, Yifan, Zakariyya MUGHAL, Jose A. JARAMILLO-VILLEGAS, Marie CORRADI, Alexandre BORREL, Ben LIEBERMAN, Suliman SHARIF, John SHAFFER, Karamarie FECHO, Ajay CHATRATH, Alexandra MAERTENS, Marc A. T. TEUNIS, Nicole KLEINSTREUER, Thomas HARTUNG, Thomas LUECHTEFELD, 2025. BioBricks.ai : a versioned data registry for life sciences data assets. In: Frontiers in Artificial Intelligence. Frontiers. 2025, 8, 1599412. eISSN 2624-8212. Verfügbar unter: doi: 10.3389/frai.2025.1599412deu
kops.citation.iso690GAO, Yifan, Zakariyya MUGHAL, Jose A. JARAMILLO-VILLEGAS, Marie CORRADI, Alexandre BORREL, Ben LIEBERMAN, Suliman SHARIF, John SHAFFER, Karamarie FECHO, Ajay CHATRATH, Alexandra MAERTENS, Marc A. T. TEUNIS, Nicole KLEINSTREUER, Thomas HARTUNG, Thomas LUECHTEFELD, 2025. BioBricks.ai : a versioned data registry for life sciences data assets. In: Frontiers in Artificial Intelligence. Frontiers. 2025, 8, 1599412. eISSN 2624-8212. Available under: doi: 10.3389/frai.2025.1599412eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/76427">
    <dc:creator>Hartung, Thomas</dc:creator>
    <dc:contributor>Corradi, Marie</dc:contributor>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/76427/1/Gao_2-ur83s3b3ig1u7.pdf"/>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/76427/1/Gao_2-ur83s3b3ig1u7.pdf"/>
    <dc:contributor>Kleinstreuer, Nicole</dc:contributor>
    <dcterms:issued>2025-08-13</dcterms:issued>
    <dc:contributor>Teunis, Marc A. T.</dc:contributor>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
    <dc:creator>Teunis, Marc A. T.</dc:creator>
    <dc:contributor>Hartung, Thomas</dc:contributor>
    <dc:creator>Fecho, Karamarie</dc:creator>
    <dc:creator>Gao, Yifan</dc:creator>
    <dc:contributor>Borrel, Alexandre</dc:contributor>
    <dc:rights>Attribution 4.0 International</dc:rights>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/>
    <dc:contributor>Maertens, Alexandra</dc:contributor>
    <dc:creator>Corradi, Marie</dc:creator>
    <dc:contributor>Mughal, Zakariyya</dc:contributor>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:creator>Chatrath, Ajay</dc:creator>
    <dc:creator>Maertens, Alexandra</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2026-03-03T08:58:56Z</dcterms:available>
    <dc:creator>Kleinstreuer, Nicole</dc:creator>
    <dc:creator>Jaramillo-Villegas, Jose A.</dc:creator>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/76427"/>
    <dc:contributor>Shaffer, John</dc:contributor>
    <dc:creator>Lieberman, Ben</dc:creator>
    <dc:contributor>Lieberman, Ben</dc:contributor>
    <dcterms:title>BioBricks.ai : a versioned data registry for life sciences data assets</dcterms:title>
    <dc:creator>Mughal, Zakariyya</dc:creator>
    <dc:contributor>Luechtefeld, Thomas</dc:contributor>
    <dc:contributor>Sharif, Suliman</dc:contributor>
    <dcterms:abstract>Introduction: Researchers in biomedicine and public health often spend weeks locating, cleansing, and integrating data from disparate sources before analysis can begin. This redundancy slows discovery and leads to inconsistent pipelines.  

Methods: We created BioBricks.ai, an open, centralized repository that packages public biological and chemical datasets as modular “bricks.” Each brick is a Data Version Control (DVC) Git repository containing an extract‑transform‑load (ETL) pipeline. A package‑manager–like interface handles installation, dependency resolution, and updates, while data are delivered through a unified backend (https://biobricks.ai).  

Results: The current release provides &gt;90 curated datasets spanning genomics, proteomics, cheminformatics, and epidemiology. Bricks can be combined programmatically to build composite resources; benchmark use‑cases show that assembling multi‑dataset analytic cohorts is reduced from days to minutes compared with bespoke scripts.  

Discussion: BioBricks.ai accelerates data access, promotes reproducible workflows, and lowers the barrier for integrating heterogeneous public datasets. By treating data as version‑controlled software, the platform encourages community contributions and reduces redundant engineering effort. Continued expansion of brick coverage and automated provenance tracking will further enhance FAIR (Findable, Accessible, Interoperable, Reusable) data practices across the life‑science community.</dcterms:abstract>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/>
    <dc:creator>Sharif, Suliman</dc:creator>
    <dc:contributor>Fecho, Karamarie</dc:contributor>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Gao, Yifan</dc:contributor>
    <dc:contributor>Chatrath, Ajay</dc:contributor>
    <dc:language>eng</dc:language>
    <dc:contributor>Jaramillo-Villegas, Jose A.</dc:contributor>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2026-03-03T08:58:56Z</dc:date>
    <dc:creator>Borrel, Alexandre</dc:creator>
    <dc:creator>Luechtefeld, Thomas</dc:creator>
    <dc:creator>Shaffer, John</dc:creator>
  </rdf:Description>
</rdf:RDF>
kops.description.funding{"first":"nsf","second":"2333728"}
kops.description.funding{"first":"nsf","second":"SBIR 2012214"}
kops.description.funding{"first":"eu","second":"963845"}
kops.description.openAccessopenaccessgold
kops.flag.isPeerReviewedtrue
kops.flag.knbibliographytrue
kops.identifier.nbnurn:nbn:de:bsz:352-2-ur83s3b3ig1u7
kops.sourcefieldFrontiers in Artificial Intelligence. Frontiers. 2025, <b>8</b>, 1599412. eISSN 2624-8212. Verfügbar unter: doi: 10.3389/frai.2025.1599412deu
kops.sourcefield.plainFrontiers in Artificial Intelligence. Frontiers. 2025, 8, 1599412. eISSN 2624-8212. Verfügbar unter: doi: 10.3389/frai.2025.1599412deu
kops.sourcefield.plainFrontiers in Artificial Intelligence. Frontiers. 2025, 8, 1599412. eISSN 2624-8212. Available under: doi: 10.3389/frai.2025.1599412eng
relation.isAuthorOfPublication36e501e4-b8aa-46a8-9514-4a52792e3f9a
relation.isAuthorOfPublication.latestForDiscovery36e501e4-b8aa-46a8-9514-4a52792e3f9a
source.bibliographicInfo.articleNumber1599412
source.bibliographicInfo.volume8
source.identifier.eissn2624-8212
source.periodicalTitleFrontiers in Artificial Intelligence
source.publisherFrontiers

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
Gao_2-ur83s3b3ig1u7.pdf
Größe:
721.57 KB
Format:
Adobe Portable Document Format
Gao_2-ur83s3b3ig1u7.pdf
Gao_2-ur83s3b3ig1u7.pdfGröße: 721.57 KBDownloads: 9