Publikation: An Open Multilingual System for Scoring Readability of Wikipedia
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
URI (zitierfähiger Link)
DOI (zitierfähiger Link)
Internationale Patentnummer
Link zur Lizenz
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
With over 60M articles, Wikipedia has become the largest platform for open and freely accessible knowledge. While it has more than 15B monthly visits, its content is believed to be inaccessible to many readers due to the lack of readability of its text. However, previous investigations of the readability of Wikipedia have been restricted to English only, and there are currently no systems supporting the automatic readability assessment of the 300+ languages in Wikipedia. To bridge this gap, we develop a multilingual model to score the readability of Wikipedia articles. To train and evaluate this model, we create a novel multilingual dataset spanning 14 languages, by matching articles from Wikipedia to simplified Wikipedia and online children encyclopedias. We show that our model performs well in a zero-shot scenario, yielding a ranking accuracy of more than 80% across 14 languages and improving upon previous benchmarks. These results demonstrate the applicability of the model at scale for languages in which there is no ground-truth data available for model fine-tuning. Furthermore, we provide the first overview on the state of readability in Wikipedia beyond English.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
TROKHYMOVYCH, Mykola, Indira SEN, Martin GERLACH, 2024. An Open Multilingual System for Scoring Readability of Wikipedia. 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, 11. Aug. 2024 - 16. Aug. 2024. In: KU, Lun-Wei, Hrsg., Andre MARTINS, Hrsg., Vivek SRIKUMAR, Hrsg.. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Volume 1: Long Papers. Stroudsburg, PA: ACL, 2024, S. 6296-6311. ISBN 979-8-89176-094-3. Verfügbar unter: doi: 10.18653/v1/2024.acl-long.342BibTex
@inproceedings{Trokhymovych2024Multi-73944,
title={An Open Multilingual System for Scoring Readability of Wikipedia},
year={2024},
doi={10.18653/v1/2024.acl-long.342},
isbn={979-8-89176-094-3},
address={Stroudsburg, PA},
publisher={ACL},
booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Volume 1: Long Papers},
pages={6296--6311},
editor={Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek},
author={Trokhymovych, Mykola and Sen, Indira and Gerlach, Martin}
}RDF
<rdf:RDF
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:bibo="http://purl.org/ontology/bibo/"
xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:void="http://rdfs.org/ns/void#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#" >
<rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/73944">
<dcterms:issued>2024</dcterms:issued>
<dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/42"/>
<bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/73944"/>
<dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/42"/>
<foaf:homepage rdf:resource="http://localhost:8080/"/>
<dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-07-14T12:01:01Z</dcterms:available>
<dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/73944/1/Trokhymovych_2-1idnsatajnxg82.pdf"/>
<dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
<dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/73944/1/Trokhymovych_2-1idnsatajnxg82.pdf"/>
<dc:contributor>Gerlach, Martin</dc:contributor>
<void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
<dc:language>eng</dc:language>
<dc:contributor>Trokhymovych, Mykola</dc:contributor>
<dc:creator>Gerlach, Martin</dc:creator>
<dc:rights>terms-of-use</dc:rights>
<dcterms:abstract>With over 60M articles, Wikipedia has become the largest platform for open and freely accessible knowledge. While it has more than 15B monthly visits, its content is believed to be inaccessible to many readers due to the lack of readability of its text. However, previous investigations of the readability of Wikipedia have been restricted to English only, and there are currently no systems supporting the automatic readability assessment of the 300+ languages in Wikipedia. To bridge this gap, we develop a multilingual model to score the readability of Wikipedia articles. To train and evaluate this model, we create a novel multilingual dataset spanning 14 languages, by matching articles from Wikipedia to simplified Wikipedia and online children encyclopedias. We show that our model performs well in a zero-shot scenario, yielding a ranking accuracy of more than 80% across 14 languages and improving upon previous benchmarks. These results demonstrate the applicability of the model at scale for languages in which there is no ground-truth data available for model fine-tuning. Furthermore, we provide the first overview on the state of readability in Wikipedia beyond English.</dcterms:abstract>
<dc:contributor>Sen, Indira</dc:contributor>
<dcterms:title>An Open Multilingual System for Scoring Readability of Wikipedia</dcterms:title>
<dc:creator>Trokhymovych, Mykola</dc:creator>
<dc:creator>Sen, Indira</dc:creator>
<dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-07-14T12:01:01Z</dc:date>
</rdf:Description>
</rdf:RDF>