Publikation: On the implementation of Latin part-of-speech taggers in intertextuality analysis : TreeTagger, CLTK, Cracovia system, LatinCy, and ChatGPT compared
Dateien
Datum
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
DOI (zitierfähiger Link)
Internationale Patentnummer
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
Digital-assisted intertextuality analysis often yields large amounts of results, many of which are irrelevant to researchers from a hermeneutic point of view. One strategy for minimizing these hermeneutically non meaningful findings is to apply a filter that sorts the results based on specified parts-of-speech. Building on Mare Revellio’s historical text-reuse grammar (HTRG) we demonstrate that an implementation of such a filter accommodating the hermeneutical context proves to be advantageous. We assessed the performance of various Latin part-of-speech (POS) taggers to refine our filtering process, using evaluation data on text congruencies from the Latin authors Virgil and Jerome. Among the Classical Language Toolkit, the TreeTagger, the Cracovia system, LatinCy, and ChatGPT, the Cracovia system surpassed the other taggers by approximately 2 percentage points in accuracy. While this tagger leverages transformer-based machine learning algorithms, the older, probabilistic-based TreeTagger demonstrated competitive performance. Although GPT-4 showed remarkable results, it still lags behind the state-of-the-art taggers for Latin. In order to build a powerful digital citation detection tool for intertextual relationships in ancient texts, the most accurate analysis of POS is crucial in filtering and evaluating valid citations.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
WITTWEILER, Michael, Franziska SCHROPP, Thomas E. KONRAD, Marie REVELLIO, Barbara FEICHTINGER, 2024. On the implementation of Latin part-of-speech taggers in intertextuality analysis : TreeTagger, CLTK, Cracovia system, LatinCy, and ChatGPT compared. In: Digital Scholarship in the Humanities. Oxford University Press (OUP). ISSN 2055-7671. eISSN 2055-768X. Verfügbar unter: doi: 10.1093/llc/fqae078BibTex
@article{Wittweiler2024-12-04imple-71740, year={2024}, doi={10.1093/llc/fqae078}, title={On the implementation of Latin part-of-speech taggers in intertextuality analysis : TreeTagger, CLTK, Cracovia system, LatinCy, and ChatGPT compared}, issn={2055-7671}, journal={Digital Scholarship in the Humanities}, author={Wittweiler, Michael and Schropp, Franziska and Konrad, Thomas E. and Revellio, Marie and Feichtinger, Barbara} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/71740"> <dc:contributor>Wittweiler, Michael</dc:contributor> <dc:language>eng</dc:language> <dc:contributor>Revellio, Marie</dc:contributor> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2024-12-19T07:57:33Z</dcterms:available> <dc:creator>Revellio, Marie</dc:creator> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2024-12-19T07:57:33Z</dc:date> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/38"/> <dcterms:issued>2024-12-04</dcterms:issued> <dc:contributor>Konrad, Thomas E.</dc:contributor> <dc:contributor>Schropp, Franziska</dc:contributor> <dc:creator>Wittweiler, Michael</dc:creator> <dc:contributor>Feichtinger, Barbara</dc:contributor> <dc:creator>Schropp, Franziska</dc:creator> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/38"/> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/71740"/> <dcterms:abstract>Digital-assisted intertextuality analysis often yields large amounts of results, many of which are irrelevant to researchers from a hermeneutic point of view. One strategy for minimizing these hermeneutically non meaningful findings is to apply a filter that sorts the results based on specified parts-of-speech. Building on Mare Revellio’s historical text-reuse grammar (HTRG) we demonstrate that an implementation of such a filter accommodating the hermeneutical context proves to be advantageous. We assessed the performance of various Latin part-of-speech (POS) taggers to refine our filtering process, using evaluation data on text congruencies from the Latin authors Virgil and Jerome. Among the Classical Language Toolkit, the TreeTagger, the Cracovia system, LatinCy, and ChatGPT, the Cracovia system surpassed the other taggers by approximately 2 percentage points in accuracy. While this tagger leverages transformer-based machine learning algorithms, the older, probabilistic-based TreeTagger demonstrated competitive performance. Although GPT-4 showed remarkable results, it still lags behind the state-of-the-art taggers for Latin. In order to build a powerful digital citation detection tool for intertextual relationships in ancient texts, the most accurate analysis of POS is crucial in filtering and evaluating valid citations.</dcterms:abstract> <dc:creator>Feichtinger, Barbara</dc:creator> <dcterms:title>On the implementation of Latin part-of-speech taggers in intertextuality analysis : TreeTagger, CLTK, Cracovia system, LatinCy, and ChatGPT compared</dcterms:title> <dc:creator>Konrad, Thomas E.</dc:creator> </rdf:Description> </rdf:RDF>