Publikation:

Testing Acoustic Voice Quality Classification Across Languages and Speech Styles

Lade...
Vorschaubild

Dateien

Braun_2-1bz2rt3fbpory9.pdf
Braun_2-1bz2rt3fbpory9.pdfGröße: 341.21 KBDownloads: 171

Datum

2021

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Open Access Green
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

HEŘMANSKÝ, Hynek, ed. and others. Proceedings of Interspeech 2021. Baixas, France: ISCA, 2021, pp. 3920-3924. Available under: doi: 10.21437/Interspeech.2021-315

Zusammenfassung

Many studies relate acoustic voice quality measures to perceptual classification. We extend this line of research by training a classifier on a balanced set of perceptually annotated voice quality categories with high inter-rater agreement, and test it on speech samples from a different language and on a different speech style. Annotations were done on continuous speech from different laboratory settings. In Experiment 1, we trained a random forest with Standard Chinese and German recordings labelled as modal, breathy, or glottalized. The model had an accuracy of 78.7% on unseen data from the same sample (most important variables were harmonics-to-noise ratio, cepstral-peak prominence, and H1-A2). This model was then used to classify data from a different language (Icelandic, Experiment 2) and to classify a different speech style (German infant-directed speech (IDS), Experiment 3). Cross-linguistic generalizability was high for Icelandic (78.6% accuracy), but lower for German IDS (71.7% accuracy). Accuracy of recordings of adult-directed speech from the same speakers as in Experiment 3 (77%, Experiment 4) suggests that it is the special speech style of IDS, rather than the recording setting that led to lower performance. Results are discussed in terms of efficiency of coding and generalizability across languages and speech styles.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
400 Sprachwissenschaft, Linguistik

Schlagwörter

voice quality, phonation type, acoustic measures, random forest, cross-linguistic generalization, infant-directed speech, German, Chinese, Icelandic

Konferenz

Interspeech 2021, 30. Aug. 2022 - 3. Sept. 2022, Brno, Czechia
Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690BRAUN, Bettina, Nicole DEHÉ, Marieke EINFELDT, Daniela WOCHNER, Katharina ZAHNER-RITTER, 2021. Testing Acoustic Voice Quality Classification Across Languages and Speech Styles. Interspeech 2021. Brno, Czechia, 30. Aug. 2022 - 3. Sept. 2022. In: HEŘMANSKÝ, Hynek, ed. and others. Proceedings of Interspeech 2021. Baixas, France: ISCA, 2021, pp. 3920-3924. Available under: doi: 10.21437/Interspeech.2021-315
BibTex
@inproceedings{Braun2021Testi-59052,
  year={2021},
  doi={10.21437/Interspeech.2021-315},
  title={Testing Acoustic Voice Quality Classification Across Languages and Speech Styles},
  publisher={ISCA},
  address={Baixas, France},
  booktitle={Proceedings of Interspeech 2021},
  pages={3920--3924},
  editor={Heřmanský, Hynek},
  author={Braun, Bettina and Dehé, Nicole and Einfeldt, Marieke and Wochner, Daniela and Zahner-Ritter, Katharina}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/59052">
    <dc:contributor>Zahner-Ritter, Katharina</dc:contributor>
    <dc:contributor>Wochner, Daniela</dc:contributor>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:creator>Braun, Bettina</dc:creator>
    <dc:creator>Wochner, Daniela</dc:creator>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/59052"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2022-11-07T14:20:45Z</dcterms:available>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/59052/3/Braun_2-1bz2rt3fbpory9.pdf"/>
    <dcterms:title>Testing Acoustic Voice Quality Classification Across Languages and Speech Styles</dcterms:title>
    <dc:creator>Dehé, Nicole</dc:creator>
    <dcterms:issued>2021</dcterms:issued>
    <dc:contributor>Braun, Bettina</dc:contributor>
    <dc:contributor>Dehé, Nicole</dc:contributor>
    <dcterms:abstract xml:lang="eng">Many studies relate acoustic voice quality measures to perceptual classification. We extend this line of research by training a classifier on a balanced set of perceptually annotated voice quality categories with high inter-rater agreement, and test it on speech samples from a different language and on a different speech style. Annotations were done on continuous speech from different laboratory settings. In Experiment 1, we trained a random forest with Standard Chinese and German recordings labelled as modal, breathy, or glottalized. The model had an accuracy of 78.7% on unseen data from the same sample (most important variables were harmonics-to-noise ratio, cepstral-peak prominence, and H1-A2). This model was then used to classify data from a different language (Icelandic, Experiment 2) and to classify a different speech style (German infant-directed speech (IDS), Experiment 3). Cross-linguistic generalizability was high for Icelandic (78.6% accuracy), but lower for German IDS (71.7% accuracy). Accuracy of recordings of adult-directed speech from the same speakers as in Experiment 3 (77%, Experiment 4) suggests that it is the special speech style of IDS, rather than the recording setting that led to lower performance. Results are discussed in terms of efficiency of coding and generalizability across languages and speech styles.</dcterms:abstract>
    <dc:rights>terms-of-use</dc:rights>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:language>eng</dc:language>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2022-11-07T14:20:45Z</dc:date>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:creator>Einfeldt, Marieke</dc:creator>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/59052/3/Braun_2-1bz2rt3fbpory9.pdf"/>
    <dc:contributor>Einfeldt, Marieke</dc:contributor>
    <dc:creator>Zahner-Ritter, Katharina</dc:creator>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Diese Publikation teilen