Publikation:

On the implementation of Latin part-of-speech taggers in intertextuality analysis : TreeTagger, CLTK, Cracovia system, LatinCy, and ChatGPT compared

Lade...
Vorschaubild

Dateien

Zu diesem Dokument gibt es keine Dateien.

Datum

2024

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

URI (zitierfähiger Link)
DOI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Deutsche Forschungsgemeinschaft (DFG): 382880410

Projekt

Open Access-Veröffentlichung
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Zeitschriftenartikel
Publikationsstatus
Published

Erschienen in

Digital Scholarship in the Humanities. Oxford University Press (OUP). ISSN 2055-7671. eISSN 2055-768X. Verfügbar unter: doi: 10.1093/llc/fqae078

Zusammenfassung

Digital-assisted intertextuality analysis often yields large amounts of results, many of which are irrelevant to researchers from a hermeneutic point of view. One strategy for minimizing these hermeneutically non meaningful findings is to apply a filter that sorts the results based on specified parts-of-speech. Building on Mare Revellio’s historical text-reuse grammar (HTRG) we demonstrate that an implementation of such a filter accommodating the hermeneutical context proves to be advantageous. We assessed the performance of various Latin part-of-speech (POS) taggers to refine our filtering process, using evaluation data on text congruencies from the Latin authors Virgil and Jerome. Among the Classical Language Toolkit, the TreeTagger, the Cracovia system, LatinCy, and ChatGPT, the Cracovia system surpassed the other taggers by approximately 2 percentage points in accuracy. While this tagger leverages transformer-based machine learning algorithms, the older, probabilistic-based TreeTagger demonstrated competitive performance. Although GPT-4 showed remarkable results, it still lags behind the state-of-the-art taggers for Latin. In order to build a powerful digital citation detection tool for intertextual relationships in ancient texts, the most accurate analysis of POS is crucial in filtering and evaluating valid citations.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
800 Literatur, Rhetorik, Literaturwissenschaft

Schlagwörter

Jerome, taggers, part-of-speech tagging, intertextuality, Latin literature, evaluation, automated citation detection, POS tagging

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690WITTWEILER, Michael, Franziska SCHROPP, Thomas E. KONRAD, Marie REVELLIO, Barbara FEICHTINGER, 2024. On the implementation of Latin part-of-speech taggers in intertextuality analysis : TreeTagger, CLTK, Cracovia system, LatinCy, and ChatGPT compared. In: Digital Scholarship in the Humanities. Oxford University Press (OUP). ISSN 2055-7671. eISSN 2055-768X. Verfügbar unter: doi: 10.1093/llc/fqae078
BibTex
@article{Wittweiler2024-12-04imple-71740,
  year={2024},
  doi={10.1093/llc/fqae078},
  title={On the implementation of Latin part-of-speech taggers in intertextuality analysis : TreeTagger, CLTK, Cracovia system, LatinCy, and ChatGPT compared},
  issn={2055-7671},
  journal={Digital Scholarship in the Humanities},
  author={Wittweiler, Michael and Schropp, Franziska and Konrad, Thomas E. and Revellio, Marie and Feichtinger, Barbara}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/71740">
    <dc:contributor>Wittweiler, Michael</dc:contributor>
    <dc:language>eng</dc:language>
    <dc:contributor>Revellio, Marie</dc:contributor>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2024-12-19T07:57:33Z</dcterms:available>
    <dc:creator>Revellio, Marie</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2024-12-19T07:57:33Z</dc:date>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/38"/>
    <dcterms:issued>2024-12-04</dcterms:issued>
    <dc:contributor>Konrad, Thomas E.</dc:contributor>
    <dc:contributor>Schropp, Franziska</dc:contributor>
    <dc:creator>Wittweiler, Michael</dc:creator>
    <dc:contributor>Feichtinger, Barbara</dc:contributor>
    <dc:creator>Schropp, Franziska</dc:creator>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/38"/>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/71740"/>
    <dcterms:abstract>Digital-assisted intertextuality analysis often yields large amounts of results, many of which are irrelevant to researchers from a hermeneutic point of view. One strategy for minimizing these hermeneutically non meaningful findings is to apply a filter that sorts the results based on specified parts-of-speech. Building on Mare Revellio’s historical text-reuse grammar (HTRG) we demonstrate that an implementation of such a filter accommodating the hermeneutical context proves to be advantageous. We assessed the performance of various Latin part-of-speech (POS) taggers to refine our filtering process, using evaluation data on text congruencies from the Latin authors Virgil and Jerome. Among the Classical Language Toolkit, the TreeTagger, the Cracovia system, LatinCy, and ChatGPT, the Cracovia system surpassed the other taggers by approximately 2 percentage points in accuracy. While this tagger leverages transformer-based machine learning algorithms, the older, probabilistic-based TreeTagger demonstrated competitive performance. Although GPT-4 showed remarkable results, it still lags behind the state-of-the-art taggers for Latin. In order to build a powerful digital citation detection tool for intertextual relationships in ancient texts, the most accurate analysis of POS is crucial in filtering and evaluating valid citations.</dcterms:abstract>
    <dc:creator>Feichtinger, Barbara</dc:creator>
    <dcterms:title>On the implementation of Latin part-of-speech taggers in intertextuality analysis : TreeTagger, CLTK, Cracovia system, LatinCy, and ChatGPT compared</dcterms:title>
    <dc:creator>Konrad, Thomas E.</dc:creator>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Ja
Online First: Zeitschriftenartikel, die schon vor ihrer Zuordnung zu einem bestimmten Zeitschriftenheft (= Issue) online gestellt werden. Online First-Artikel werden auf der Homepage des Journals in der Verlagsfassung veröffentlicht.
Diese Publikation teilen