Publikation:

An Open Multilingual System for Scoring Readability of Wikipedia

Lade...
Vorschaubild

Dateien

Trokhymovych_2-1idnsatajnxg82.pdf
Trokhymovych_2-1idnsatajnxg82.pdfGröße: 2.78 MBDownloads: 64

Datum

2024

Autor:innen

Trokhymovych, Mykola
Gerlach, Martin

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Open Access Bookpart
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

KU, Lun-Wei, Hrsg., Andre MARTINS, Hrsg., Vivek SRIKUMAR, Hrsg.. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Volume 1: Long Papers. Stroudsburg, PA: ACL, 2024, S. 6296-6311. ISBN 979-8-89176-094-3. Verfügbar unter: doi: 10.18653/v1/2024.acl-long.342

Zusammenfassung

With over 60M articles, Wikipedia has become the largest platform for open and freely accessible knowledge. While it has more than 15B monthly visits, its content is believed to be inaccessible to many readers due to the lack of readability of its text. However, previous investigations of the readability of Wikipedia have been restricted to English only, and there are currently no systems supporting the automatic readability assessment of the 300+ languages in Wikipedia. To bridge this gap, we develop a multilingual model to score the readability of Wikipedia articles. To train and evaluate this model, we create a novel multilingual dataset spanning 14 languages, by matching articles from Wikipedia to simplified Wikipedia and online children encyclopedias. We show that our model performs well in a zero-shot scenario, yielding a ranking accuracy of more than 80% across 14 languages and improving upon previous benchmarks. These results demonstrate the applicability of the model at scale for languages in which there is no ground-truth data available for model fine-tuning. Furthermore, we provide the first overview on the state of readability in Wikipedia beyond English.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Konferenz

62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), 11. Aug. 2024 - 16. Aug. 2024, Bangkok, Thailand
Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690TROKHYMOVYCH, Mykola, Indira SEN, Martin GERLACH, 2024. An Open Multilingual System for Scoring Readability of Wikipedia. 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, 11. Aug. 2024 - 16. Aug. 2024. In: KU, Lun-Wei, Hrsg., Andre MARTINS, Hrsg., Vivek SRIKUMAR, Hrsg.. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Volume 1: Long Papers. Stroudsburg, PA: ACL, 2024, S. 6296-6311. ISBN 979-8-89176-094-3. Verfügbar unter: doi: 10.18653/v1/2024.acl-long.342
BibTex
@inproceedings{Trokhymovych2024Multi-73944,
  title={An Open Multilingual System for Scoring Readability of Wikipedia},
  year={2024},
  doi={10.18653/v1/2024.acl-long.342},
  isbn={979-8-89176-094-3},
  address={Stroudsburg, PA},
  publisher={ACL},
  booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Volume 1: Long Papers},
  pages={6296--6311},
  editor={Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek},
  author={Trokhymovych, Mykola and Sen, Indira and Gerlach, Martin}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/73944">
    <dcterms:issued>2024</dcterms:issued>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/42"/>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/73944"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/42"/>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-07-14T12:01:01Z</dcterms:available>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/73944/1/Trokhymovych_2-1idnsatajnxg82.pdf"/>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/73944/1/Trokhymovych_2-1idnsatajnxg82.pdf"/>
    <dc:contributor>Gerlach, Martin</dc:contributor>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:language>eng</dc:language>
    <dc:contributor>Trokhymovych, Mykola</dc:contributor>
    <dc:creator>Gerlach, Martin</dc:creator>
    <dc:rights>terms-of-use</dc:rights>
    <dcterms:abstract>With over 60M articles, Wikipedia has become the largest platform for open and freely accessible knowledge. While it has more than 15B monthly visits, its content is believed to be inaccessible to many readers due to the lack of readability of its text. However, previous investigations of the readability of Wikipedia have been restricted to English only, and there are currently no systems supporting the automatic readability assessment of the 300+ languages in Wikipedia. To bridge this gap, we develop a multilingual model to score the readability of Wikipedia articles. To train and evaluate this model, we create a novel multilingual dataset spanning 14 languages, by matching articles from Wikipedia to simplified Wikipedia and online children encyclopedias. We show that our model performs well in a zero-shot scenario, yielding a ranking accuracy of more than 80% across 14 languages and improving upon previous benchmarks. These results demonstrate the applicability of the model at scale for languages in which there is no ground-truth data available for model fine-tuning. Furthermore, we provide the first overview on the state of readability in Wikipedia beyond English.</dcterms:abstract>
    <dc:contributor>Sen, Indira</dc:contributor>
    <dcterms:title>An Open Multilingual System for Scoring Readability of Wikipedia</dcterms:title>
    <dc:creator>Trokhymovych, Mykola</dc:creator>
    <dc:creator>Sen, Indira</dc:creator>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-07-14T12:01:01Z</dc:date>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Diese Publikation teilen