Publikation:

GHisBERT – Training BERT from scratch for lexical semantic investigations across historical German language stages

Lade...
Vorschaubild

Dateien

Zu diesem Dokument gibt es keine Dateien.

Datum

2023

Autor:innen

Köllner, Marisa

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

URI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Deutsche Forschungsgemeinschaft (DFG): 251654672

Projekt

Open Access-Veröffentlichung
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

TAHMASEBI, Nina, ed., Syrielle MONTARIOL, ed., Haim DUBOSSARSKY, ed. and others. Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change. Stroudsburg, PA: Association for Computational Linguistics, 2023, pp. 33-45. Available under: doi: 10.18653/v1/2023.lchange-1.4

Zusammenfassung

While static embeddings have dominated computational approaches to lexical semantic change for quite some time, recent approaches try to leverage the contextualized embeddings generated by the language model BERT for identifying semantic shifts in historical texts. However, despite their usability for detecting changes in the more recent past, it remains unclear how well language models scale to investigations going back further in time, where the language differs substantially from the training data underlying the models. In this paper, we present GHisBERT, a BERT-based language model trained from scratch on historical data covering all attested stages of German (going back to Old High German, c. 750 CE). Given a lack of ground truth data for investigating lexical semantic change across historical German language stages, we evaluate our model via a lexical similarity analysis of ten stable concepts. We show that, in comparison with an unmodified and a fine-tuned German BERT-base model, our model performs best in terms of assessing inter-concept similarity as well as intra-concept similarity over time. This in turn argues for the necessity of pre-training historical language models from scratch when working with historical linguistic data.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
400 Sprachwissenschaft, Linguistik

Schlagwörter

Konferenz

4th International Workshop on Computational Approaches to Historical Language Change 2023, 6. Dez. 2023 - 10. Dez. 2023, Singapore
Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690BECK, Christin, Marisa KÖLLNER, 2023. GHisBERT – Training BERT from scratch for lexical semantic investigations across historical German language stages. 4th International Workshop on Computational Approaches to Historical Language Change 2023. Singapore, 6. Dez. 2023 - 10. Dez. 2023. In: TAHMASEBI, Nina, ed., Syrielle MONTARIOL, ed., Haim DUBOSSARSKY, ed. and others. Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change. Stroudsburg, PA: Association for Computational Linguistics, 2023, pp. 33-45. Available under: doi: 10.18653/v1/2023.lchange-1.4
BibTex
@inproceedings{Beck2023GHisB-69901,
  year={2023},
  doi={10.18653/v1/2023.lchange-1.4},
  title={GHisBERT – Training BERT from scratch for lexical semantic investigations across historical German language stages},
  publisher={Association for Computational Linguistics},
  address={Stroudsburg, PA},
  booktitle={Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change},
  pages={33--45},
  editor={Tahmasebi, Nina and Montariol, Syrielle and Dubossarsky, Haim},
  author={Beck, Christin and Köllner, Marisa}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/69901">
    <dc:language>eng</dc:language>
    <dcterms:title>GHisBERT – Training BERT from scratch for lexical semantic investigations across historical German language stages</dcterms:title>
    <dc:contributor>Köllner, Marisa</dc:contributor>
    <dcterms:abstract>While static embeddings have dominated computational approaches to lexical semantic change for quite some time, recent approaches try to leverage the contextualized embeddings generated by the language model BERT for identifying semantic shifts in historical texts. However, despite their usability for detecting changes in the more recent past, it remains unclear how well language models scale to investigations going back further in time, where the language differs substantially from the training data underlying the models. In this paper, we present GHisBERT, a BERT-based language model trained from scratch on historical data covering all attested stages of German (going back to Old High German, c. 750 CE). Given a lack of ground truth data for investigating lexical semantic change across historical German language stages, we evaluate our model via a lexical similarity analysis of ten stable concepts. We show that, in comparison with an unmodified and a fine-tuned German BERT-base model, our model performs best in terms of assessing inter-concept similarity as well as intra-concept similarity over time. This in turn argues for the necessity of pre-training historical language models from scratch when working with historical linguistic data.</dcterms:abstract>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2024-05-02T10:33:43Z</dc:date>
    <dc:creator>Köllner, Marisa</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:creator>Beck, Christin</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2024-05-02T10:33:43Z</dcterms:available>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/69901"/>
    <dcterms:issued>2023</dcterms:issued>
    <dc:contributor>Beck, Christin</dc:contributor>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Diese Publikation teilen