Publikation: animal2vec and MeerKAT : A self‐supervised transformer for rare‐event raw audio input and a large‐scale reference dataset for bioacoustics
Dateien
Datum
Autor:innen
Herausgeber:innen
Kontakt
ISSN der Zeitschrift
item.preview.dc.identifier.eissn
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
URI (zitierfähiger Link)
DOI (zitierfähiger Link)
item.preview.dc.identifier.arxiv
Internationale Patentnummer
Link zur Lizenz
Angaben zur Forschungsförderung
European Union (EU): 742808
Deutsche Forschungsgemeinschaft (DFG): EXC 2117– 422037984
European Union (EU): 101071532
Projekt
Open Access-Veröffentlichung
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
-
Bioacoustic research, vital for promoting conservation and understanding animal behaviour and ecology, faces a monumental challenge: analysing vast datasets where animal vocalizations are rare. While deep learning techniques are becoming standard, adapting them to bioacoustics remains difficult.
-
We address this challenge with animal2vec, an interpretable large transformer model and a self-supervised training scheme tailored for sparse and unbalanced bioacoustic data. It learns from unlabelled audio and then refines its understanding with labelled data. Furthermore, we introduce and publicly release MeerKAT: Meerkat Kalahari Audio Transcripts, a dataset of meerkat (Suricata suricatta) vocalizations with millisecond-resolution annotations, the largest labelled dataset on a non-human terrestrial mammal currently available.
-
Our model sets a baseline on the MeerKAT corpus, outperforming other transformer models, and improves on existing methods on the publicly available NIPS4Bplus birdsong dataset. Moreover, animal2vec performs well even with limited labelled data (few-shot learning).
-
animal2vec and MeerKAT provide a new reference point for bioacoustic research, enabling scientists to analyse large amounts of data even with scarce ground truth information.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
SCHÄFER-ZIMMERMANN, Julian, Vlad DEMARTSEV, Baptiste AVERLY, Kiran L. DHANJAL-ADAMS, Mathieu DUTEIL, Gabriella GALL, Marius FAISS, Lily JOHNSON‐ULRICH, Dan STOWELL, Marta B. MANSER, Marie A. ROCH, Ariana STRANDBURG-PESHKIN, 2026. animal2vec and MeerKAT : A self‐supervised transformer for rare‐event raw audio input and a large‐scale reference dataset for bioacoustics. In: Methods in Ecology and Evolution. Wiley. 2026, 17(3), S. 875-888. ISSN 2041-2096. eISSN 2041-210X. Verfügbar unter: doi: 10.1111/2041-210x.70218BibTex
@article{SchaferZimmermann2026anima-76714,
title={animal2vec and MeerKAT : A self‐supervised transformer for rare‐event raw audio input and a large‐scale reference dataset for bioacoustics},
year={2026},
doi={10.1111/2041-210x.70218},
number={3},
volume={17},
issn={2041-2096},
journal={Methods in Ecology and Evolution},
pages={875--888},
author={Schäfer-Zimmermann, Julian and Demartsev, Vlad and Averly, Baptiste and Dhanjal-Adams, Kiran L. and Duteil, Mathieu and Gall, Gabriella and Faiß, Marius and Johnson‐Ulrich, Lily and Stowell, Dan and Manser, Marta B. and Roch, Marie A. and Strandburg-Peshkin, Ariana}
}RDF
<rdf:RDF
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:bibo="http://purl.org/ontology/bibo/"
xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:void="http://rdfs.org/ns/void#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/76714">
<dc:creator>Johnson‐Ulrich, Lily</dc:creator>
<dc:creator>Demartsev, Vlad</dc:creator>
<foaf:homepage rdf:resource="http://localhost:8080/"/>
<dc:contributor>Manser, Marta B.</dc:contributor>
<dc:contributor>Duteil, Mathieu</dc:contributor>
<dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/>
<dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/76714/1/Schaefer-Zimmermann_2-6raebs9cpyx46.pdf"/>
<dc:contributor>Roch, Marie A.</dc:contributor>
<dcterms:issued>2026</dcterms:issued>
<dc:creator>Duteil, Mathieu</dc:creator>
<dc:contributor>Averly, Baptiste</dc:contributor>
<void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
<dc:contributor>Demartsev, Vlad</dc:contributor>
<dc:creator>Strandburg-Peshkin, Ariana</dc:creator>
<bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/76714"/>
<dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/>
<dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/43615"/>
<dc:creator>Gall, Gabriella</dc:creator>
<dc:contributor>Faiß, Marius</dc:contributor>
<dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/43615"/>
<dc:creator>Stowell, Dan</dc:creator>
<dc:contributor>Strandburg-Peshkin, Ariana</dc:contributor>
<dc:contributor>Schäfer-Zimmermann, Julian</dc:contributor>
<dcterms:title>animal2vec and MeerKAT : A self‐supervised transformer for rare‐event raw audio input and a large‐scale reference dataset for bioacoustics</dcterms:title>
<dc:creator>Faiß, Marius</dc:creator>
<dc:creator>Averly, Baptiste</dc:creator>
<dc:creator>Roch, Marie A.</dc:creator>
<dc:creator>Dhanjal-Adams, Kiran L.</dc:creator>
<dc:contributor>Gall, Gabriella</dc:contributor>
<dc:creator>Schäfer-Zimmermann, Julian</dc:creator>
<dc:creator>Manser, Marta B.</dc:creator>
<dc:rights>Attribution 4.0 International</dc:rights>
<dc:contributor>Johnson‐Ulrich, Lily</dc:contributor>
<dcterms:abstract>1. Bioacoustic research, vital for promoting conservation and understanding animal behaviour and ecology, faces a monumental challenge: analysing vast datasets where animal vocalizations are rare. While deep learning techniques are becoming standard, adapting them to bioacoustics remains difficult.
2. We address this challenge with animal2vec, an interpretable large transformer model and a self-supervised training scheme tailored for sparse and unbalanced bioacoustic data. It learns from unlabelled audio and then refines its understanding with labelled data. Furthermore, we introduce and publicly release MeerKAT: Meerkat Kalahari Audio Transcripts, a dataset of meerkat (Suricata suricatta) vocalizations with millisecond-resolution annotations, the largest labelled dataset on a non-human terrestrial mammal currently available.
3. Our model sets a baseline on the MeerKAT corpus, outperforming other transformer models, and improves on existing methods on the publicly available NIPS4Bplus birdsong dataset. Moreover, animal2vec performs well even with limited labelled data (few-shot learning).
4. animal2vec and MeerKAT provide a new reference point for bioacoustic research, enabling scientists to analyse large amounts of data even with scarce ground truth information.</dcterms:abstract>
<dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/76714/1/Schaefer-Zimmermann_2-6raebs9cpyx46.pdf"/>
<dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2026-03-25T09:12:41Z</dc:date>
<dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2026-03-25T09:12:41Z</dcterms:available>
<dc:contributor>Dhanjal-Adams, Kiran L.</dc:contributor>
<dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
<dc:language>eng</dc:language>
<dc:contributor>Stowell, Dan</dc:contributor>
</rdf:Description>
</rdf:RDF>