Publikation: Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
URI (zitierfähiger Link)
DOI (zitierfähiger Link)
Internationale Patentnummer
Link zur Lizenz
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Sammlungen
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
We address the challenge of syntactic parsing for Urdu, a morphologically rich language, and present state-of-the-art results for both constituency and dependency parsing. This paper offers four major contributions: 1) the conversion of the CLE-UTB phrase structure treebank into a dependency treebank by developing language-specific head-word and phrase-to-dependency label mapping rules; 2) a novel sequence labeling scheme that transforms the parsing task into a unified representation; 3) the training of contextualized word representations on a large 220 million tokens Urdu corpus collected from the web; and 4) development of parsing framework using two learning paradigms, single-task and multi-task learning. Several post-processing rules are applied to improve the quality of the automatically converted dependency structure treebank. The proposed sequence labeling scheme enables the use of a shared architecture that learns the syntactic structures from both grammatical structures simultaneously and hence improves generalization. Experiments show that the multi-task learning setup significantly enhances parsing performance, achieving an F1 score of 91.39 for constituency parsing (an improvement of 3.29 points) and a labeled attachment score of 85.69 for dependency parsing (an improvement of 1.49 points). These results demonstrate that learning cross-task representations provides measurable benefits and advances the state of syntactic parsing for Urdu.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
EHSAN, Toqeer, Miriam BUTT, Sarmad HUSSAIN, Hassan ALHUZALI, Ali AL-LAITH, 2025. Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language. In: PLoS One. Public Library of Science (PLoS). 2025, 20(9), e0332580. eISSN 1932-6203. Verfügbar unter: doi: 10.1371/journal.pone.0332580BibTex
@article{Ehsan2025-09-25Multi-74821,
title={Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language},
year={2025},
doi={10.1371/journal.pone.0332580},
number={9},
volume={20},
journal={PLoS One},
author={Ehsan, Toqeer and Butt, Miriam and Hussain, Sarmad and Alhuzali, Hassan and Al-Laith, Ali},
note={Article Number: e0332580}
}RDF
<rdf:RDF
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:bibo="http://purl.org/ontology/bibo/"
xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:void="http://rdfs.org/ns/void#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#" >
<rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/74821">
<dcterms:title>Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language</dcterms:title>
<dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
<dc:creator>Al-Laith, Ali</dc:creator>
<dc:creator>Alhuzali, Hassan</dc:creator>
<dc:language>eng</dc:language>
<dc:contributor>Hussain, Sarmad</dc:contributor>
<bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/74821"/>
<dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-10-14T08:49:53Z</dc:date>
<dcterms:abstract>We address the challenge of syntactic parsing for Urdu, a morphologically rich language, and present state-of-the-art results for both constituency and dependency parsing. This paper offers four major contributions: 1) the conversion of the CLE-UTB phrase structure treebank into a dependency treebank by developing language-specific head-word and phrase-to-dependency label mapping rules; 2) a novel sequence labeling scheme that transforms the parsing task into a unified representation; 3) the training of contextualized word representations on a large 220 million tokens Urdu corpus collected from the web; and 4) development of parsing framework using two learning paradigms, single-task and multi-task learning. Several post-processing rules are applied to improve the quality of the automatically converted dependency structure treebank. The proposed sequence labeling scheme enables the use of a shared architecture that learns the syntactic structures from both grammatical structures simultaneously and hence improves generalization. Experiments show that the multi-task learning setup significantly enhances parsing performance, achieving an F1 score of 91.39 for constituency parsing (an improvement of 3.29 points) and a labeled attachment score of 85.69 for dependency parsing (an improvement of 1.49 points). These results demonstrate that learning cross-task representations provides measurable benefits and advances the state of syntactic parsing for Urdu.</dcterms:abstract>
<dcterms:issued>2025-09-25</dcterms:issued>
<dc:creator>Butt, Miriam</dc:creator>
<dc:contributor>Ehsan, Toqeer</dc:contributor>
<dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
<void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
<dc:contributor>Al-Laith, Ali</dc:contributor>
<dc:rights>Attribution 4.0 International</dc:rights>
<dc:creator>Ehsan, Toqeer</dc:creator>
<dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
<dc:creator>Hussain, Sarmad</dc:creator>
<dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-10-14T08:49:53Z</dcterms:available>
<dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/74821/1/Ehsan_2-bziqya98f9l9.pdf"/>
<foaf:homepage rdf:resource="http://localhost:8080/"/>
<dc:contributor>Alhuzali, Hassan</dc:contributor>
<dc:contributor>Butt, Miriam</dc:contributor>
<dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/74821/1/Ehsan_2-bziqya98f9l9.pdf"/>
</rdf:Description>
</rdf:RDF>