Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline

dc.contributor.authorBest, Paul
dc.contributor.authorAraya-Salas, Marcelo
dc.contributor.authorEkström, Axel G.
dc.contributor.authorFreitas, Bárbara
dc.contributor.authorJensen, Frants H.
dc.contributor.authorKershenbaum, Arik
dc.contributor.authorLameira, Adriano R.
dc.contributor.authorStrandburg-Peshkin, Ariana
dc.contributor.authorMarxer, Ricard
dc.date.accessioned2025-06-10T07:25:09Z
dc.date.available2025-06-10T07:25:09Z
dc.date.issued2025-07-04
dc.description.abstractThe fundamental frequency (F0) is a key parameter for characterising structures in vertebrate vocalisations, for instance defining vocal repertoires and their variations at different biological scales (e.g. population dialects, individual signatures). However, the task is too laborious to perform manually, and its automation is complex. Despite significant advancements in the fields of speech and music for automatic F0 estimation, similar progress in bioacoustics has been limited. To address this gap, we compile and publish a benchmark dataset of over 250,000 calls from 14 taxa, each paired with ground truth F0 values. These vocalisations range from infra-sounds to ultra-sounds, from high to low harmonicity, and some include non-linear phenomena. Testing different algorithms on these signals, we demonstrate the potential of neural networks for F0 estimation, even for taxa not seen in training, or when trained without labels. Also, to inform on the applicability of algorithms to analyse signals, we propose spectral measurements of F0 quality which correlate well with performance. While current performance results are not satisfying for all studied taxa, they suggest that deep learning could bring a more generic and reliable bioacoustic F0 tracker, helping the community to analyse vocalisations via their F0 contours.
dc.description.versionpublisheddeu
dc.identifier.doi10.1080/09524622.2025.2500380
dc.identifier.ppn1930424647
dc.identifier.urihttps://kops.uni-konstanz.de/handle/123456789/73543
dc.language.isoeng
dc.rightsterms-of-use
dc.rights.urihttps://rightsstatements.org/page/InC/1.0/
dc.subject.ddc570
dc.titleBioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baselineeng
dc.typeJOURNAL_ARTICLE
dspace.entity.typePublication
kops.citation.bibtex
@article{Best2025-07-04Bioac-73543,
  title={Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline},
  year={2025},
  doi={10.1080/09524622.2025.2500380},
  number={4},
  volume={34},
  issn={0952-4622},
  journal={Bioacoustics},
  pages={419--446},
  author={Best, Paul and Araya-Salas, Marcelo and Ekström, Axel G. and Freitas, Bárbara and Jensen, Frants H. and Kershenbaum, Arik and Lameira, Adriano R. and Strandburg-Peshkin, Ariana and Marxer, Ricard}
}
kops.citation.iso690BEST, Paul, Marcelo ARAYA-SALAS, Axel G. EKSTRÖM, Bárbara FREITAS, Frants H. JENSEN, Arik KERSHENBAUM, Adriano R. LAMEIRA, Ariana STRANDBURG-PESHKIN, Ricard MARXER, 2025. Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline. In: Bioacoustics. Taylor & Francis. 2025, 34(4), S. 419-446. ISSN 0952-4622. eISSN 2165-0586. Verfügbar unter: doi: 10.1080/09524622.2025.2500380deu
kops.citation.iso690BEST, Paul, Marcelo ARAYA-SALAS, Axel G. EKSTRÖM, Bárbara FREITAS, Frants H. JENSEN, Arik KERSHENBAUM, Adriano R. LAMEIRA, Ariana STRANDBURG-PESHKIN, Ricard MARXER, 2025. Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline. In: Bioacoustics. Taylor & Francis. 2025, 34(4), pp. 419-446. ISSN 0952-4622. eISSN 2165-0586. Available under: doi: 10.1080/09524622.2025.2500380eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/73543">
    <dc:contributor>Ekström, Axel G.</dc:contributor>
    <dc:creator>Lameira, Adriano R.</dc:creator>
    <dc:creator>Jensen, Frants H.</dc:creator>
    <dc:contributor>Jensen, Frants H.</dc:contributor>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/73543"/>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:contributor>Best, Paul</dc:contributor>
    <dc:contributor>Kershenbaum, Arik</dc:contributor>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:abstract>The fundamental frequency (F0) is a key parameter for characterising structures in vertebrate vocalisations, for instance defining vocal repertoires and their variations at different biological scales (e.g. population dialects, individual signatures). However, the task is too laborious to perform manually, and its automation is complex. Despite significant advancements in the fields of speech and music for automatic F0 estimation, similar progress in bioacoustics has been limited. To address this gap, we compile and publish a benchmark dataset of over 250,000 calls from 14 taxa, each paired with ground truth F0 values. These vocalisations range from infra-sounds to ultra-sounds, from high to low harmonicity, and some include non-linear phenomena. Testing different algorithms on these signals, we demonstrate the potential of neural networks for F0 estimation, even for taxa not seen in training, or when trained without labels. Also, to inform on the applicability of algorithms to analyse signals, we propose spectral measurements of F0 quality which correlate well with performance. While current performance results are not satisfying for all studied taxa, they suggest that deep learning could bring a more generic and reliable bioacoustic F0 tracker, helping the community to analyse vocalisations via their F0 contours.</dcterms:abstract>
    <dc:creator>Strandburg-Peshkin, Ariana</dc:creator>
    <dc:rights>terms-of-use</dc:rights>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-06-10T07:25:09Z</dcterms:available>
    <dcterms:issued>2025-07-04</dcterms:issued>
    <dc:creator>Araya-Salas, Marcelo</dc:creator>
    <dc:contributor>Araya-Salas, Marcelo</dc:contributor>
    <dc:contributor>Lameira, Adriano R.</dc:contributor>
    <dc:creator>Best, Paul</dc:creator>
    <dc:contributor>Marxer, Ricard</dc:contributor>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:creator>Ekström, Axel G.</dc:creator>
    <dc:creator>Marxer, Ricard</dc:creator>
    <dc:language>eng</dc:language>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/73543/1/Best_2-9o5wnofv2qwa8.pdf"/>
    <dcterms:title>Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline</dcterms:title>
    <dc:contributor>Freitas, Bárbara</dc:contributor>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/73543/1/Best_2-9o5wnofv2qwa8.pdf"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-06-10T07:25:09Z</dc:date>
    <dc:creator>Freitas, Bárbara</dc:creator>
    <dc:contributor>Strandburg-Peshkin, Ariana</dc:contributor>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/28"/>
    <dc:creator>Kershenbaum, Arik</dc:creator>
  </rdf:Description>
</rdf:RDF>
kops.description.funding{"second":"OISE1853934","first":"nsf"}
kops.description.openAccessopenaccessgreen
kops.flag.etalAuthortrue
kops.flag.isPeerReviewedtrue
kops.flag.knbibliographytrue
kops.identifier.nbnurn:nbn:de:bsz:352-2-9o5wnofv2qwa8
kops.sourcefieldBioacoustics. Taylor & Francis. 2025, <b>34</b>(4), S. 419-446. ISSN 0952-4622. eISSN 2165-0586. Verfügbar unter: doi: 10.1080/09524622.2025.2500380deu
kops.sourcefield.plainBioacoustics. Taylor & Francis. 2025, 34(4), S. 419-446. ISSN 0952-4622. eISSN 2165-0586. Verfügbar unter: doi: 10.1080/09524622.2025.2500380deu
kops.sourcefield.plainBioacoustics. Taylor & Francis. 2025, 34(4), pp. 419-446. ISSN 0952-4622. eISSN 2165-0586. Available under: doi: 10.1080/09524622.2025.2500380eng
relation.isAuthorOfPublicationcdaf3e23-9cf7-44e0-829b-7012dfae32e4
relation.isAuthorOfPublication.latestForDiscoverycdaf3e23-9cf7-44e0-829b-7012dfae32e4
relation.isDatasetOfPublication5a954d50-3133-4b0a-9a3d-e1ac2564b619
relation.isDatasetOfPublication.latestForDiscovery5a954d50-3133-4b0a-9a3d-e1ac2564b619
source.bibliographicInfo.fromPage419
source.bibliographicInfo.issue4
source.bibliographicInfo.toPage446
source.bibliographicInfo.volume34
source.identifier.eissn2165-0586
source.identifier.issn0952-4622
source.periodicalTitleBioacoustics
source.publisherTaylor & Francis
temp.date.embargoEnd2026-06-15
temp.description.funding{"first":"Foundation for Science and Technology","second":"2020.04569.BD"}

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
Best_2-9o5wnofv2qwa8.pdf
Größe:
2.73 MB
Format:
Adobe Portable Document Format
Best_2-9o5wnofv2qwa8.pdf
Best_2-9o5wnofv2qwa8.pdfGröße: 2.73 MBDownloads: ?