Publikation:

Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language

Lade...
Vorschaubild

Dateien

Ehsan_2-bziqya98f9l9.pdf
Ehsan_2-bziqya98f9l9.pdfGröße: 6.76 MBDownloads: 7

Datum

2025

Autor:innen

Ehsan, Toqeer
Hussain, Sarmad
Alhuzali, Hassan
Al-Laith, Ali

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

ArXiv-ID

Internationale Patentnummer

Link zur Lizenz

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Open Access Gold
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Zeitschriftenartikel
Publikationsstatus
Published

Erschienen in

PLoS One. Public Library of Science (PLoS). 2025, 20(9), e0332580. eISSN 1932-6203. Verfügbar unter: doi: 10.1371/journal.pone.0332580

Zusammenfassung

We address the challenge of syntactic parsing for Urdu, a morphologically rich language, and present state-of-the-art results for both constituency and dependency parsing. This paper offers four major contributions: 1) the conversion of the CLE-UTB phrase structure treebank into a dependency treebank by developing language-specific head-word and phrase-to-dependency label mapping rules; 2) a novel sequence labeling scheme that transforms the parsing task into a unified representation; 3) the training of contextualized word representations on a large 220 million tokens Urdu corpus collected from the web; and 4) development of parsing framework using two learning paradigms, single-task and multi-task learning. Several post-processing rules are applied to improve the quality of the automatically converted dependency structure treebank. The proposed sequence labeling scheme enables the use of a shared architecture that learns the syntactic structures from both grammatical structures simultaneously and hence improves generalization. Experiments show that the multi-task learning setup significantly enhances parsing performance, achieving an F1 score of 91.39 for constituency parsing (an improvement of 3.29 points) and a labeled attachment score of 85.69 for dependency parsing (an improvement of 1.49 points). These results demonstrate that learning cross-task representations provides measurable benefits and advances the state of syntactic parsing for Urdu.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
400 Sprachwissenschaft, Linguistik

Schlagwörter

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690EHSAN, Toqeer, Miriam BUTT, Sarmad HUSSAIN, Hassan ALHUZALI, Ali AL-LAITH, 2025. Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language. In: PLoS One. Public Library of Science (PLoS). 2025, 20(9), e0332580. eISSN 1932-6203. Verfügbar unter: doi: 10.1371/journal.pone.0332580
BibTex
@article{Ehsan2025-09-25Multi-74821,
  title={Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language},
  year={2025},
  doi={10.1371/journal.pone.0332580},
  number={9},
  volume={20},
  journal={PLoS One},
  author={Ehsan, Toqeer and Butt, Miriam and Hussain, Sarmad and Alhuzali, Hassan and Al-Laith, Ali},
  note={Article Number: e0332580}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/74821">
    <dcterms:title>Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language</dcterms:title>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
    <dc:creator>Al-Laith, Ali</dc:creator>
    <dc:creator>Alhuzali, Hassan</dc:creator>
    <dc:language>eng</dc:language>
    <dc:contributor>Hussain, Sarmad</dc:contributor>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/74821"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-10-14T08:49:53Z</dc:date>
    <dcterms:abstract>We address the challenge of syntactic parsing for Urdu, a morphologically rich language, and present state-of-the-art results for both constituency and dependency parsing. This paper offers four major contributions: 1) the conversion of the CLE-UTB phrase structure treebank into a dependency treebank by developing language-specific head-word and phrase-to-dependency label mapping rules; 2) a novel sequence labeling scheme that transforms the parsing task into a unified representation; 3) the training of contextualized word representations on a large 220 million tokens Urdu corpus collected from the web; and 4) development of parsing framework using two learning paradigms, single-task and multi-task learning. Several post-processing rules are applied to improve the quality of the automatically converted dependency structure treebank. The proposed sequence labeling scheme enables the use of a shared architecture that learns the syntactic structures from both grammatical structures simultaneously and hence improves generalization. Experiments show that the multi-task learning setup significantly enhances parsing performance, achieving an F1 score of 91.39 for constituency parsing (an improvement of 3.29 points) and a labeled attachment score of 85.69 for dependency parsing (an improvement of 1.49 points). These results demonstrate that learning cross-task representations provides measurable benefits and advances the state of syntactic parsing for Urdu.</dcterms:abstract>
    <dcterms:issued>2025-09-25</dcterms:issued>
    <dc:creator>Butt, Miriam</dc:creator>
    <dc:contributor>Ehsan, Toqeer</dc:contributor>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Al-Laith, Ali</dc:contributor>
    <dc:rights>Attribution 4.0 International</dc:rights>
    <dc:creator>Ehsan, Toqeer</dc:creator>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:creator>Hussain, Sarmad</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-10-14T08:49:53Z</dcterms:available>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/74821/1/Ehsan_2-bziqya98f9l9.pdf"/>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:contributor>Alhuzali, Hassan</dc:contributor>
    <dc:contributor>Butt, Miriam</dc:contributor>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/74821/1/Ehsan_2-bziqya98f9l9.pdf"/>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Ja
Diese Publikation teilen