Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language

dc.contributor.authorEhsan, Toqeer
dc.contributor.authorButt, Miriam
dc.contributor.authorHussain, Sarmad
dc.contributor.authorAlhuzali, Hassan
dc.contributor.authorAl-Laith, Ali
dc.date.accessioned2025-10-14T08:49:53Z
dc.date.available2025-10-14T08:49:53Z
dc.date.issued2025-09-25
dc.description.abstractWe address the challenge of syntactic parsing for Urdu, a morphologically rich language, and present state-of-the-art results for both constituency and dependency parsing. This paper offers four major contributions: 1) the conversion of the CLE-UTB phrase structure treebank into a dependency treebank by developing language-specific head-word and phrase-to-dependency label mapping rules; 2) a novel sequence labeling scheme that transforms the parsing task into a unified representation; 3) the training of contextualized word representations on a large 220 million tokens Urdu corpus collected from the web; and 4) development of parsing framework using two learning paradigms, single-task and multi-task learning. Several post-processing rules are applied to improve the quality of the automatically converted dependency structure treebank. The proposed sequence labeling scheme enables the use of a shared architecture that learns the syntactic structures from both grammatical structures simultaneously and hence improves generalization. Experiments show that the multi-task learning setup significantly enhances parsing performance, achieving an F1 score of 91.39 for constituency parsing (an improvement of 3.29 points) and a labeled attachment score of 85.69 for dependency parsing (an improvement of 1.49 points). These results demonstrate that learning cross-task representations provides measurable benefits and advances the state of syntactic parsing for Urdu.
dc.description.versionpublisheddeu
dc.identifier.doi10.1371/journal.pone.0332580
dc.identifier.ppn1938347919
dc.identifier.urihttps://kops.uni-konstanz.de/handle/123456789/74821
dc.language.isoeng
dc.rightsAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subject.ddc400
dc.titleMulti-task learning by using contextualized word representations for syntactic parsing of a morphologically rich languageeng
dc.typeJOURNAL_ARTICLE
dspace.entity.typePublication
kops.citation.bibtex
@article{Ehsan2025-09-25Multi-74821,
  title={Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language},
  year={2025},
  doi={10.1371/journal.pone.0332580},
  number={9},
  volume={20},
  journal={PLoS One},
  author={Ehsan, Toqeer and Butt, Miriam and Hussain, Sarmad and Alhuzali, Hassan and Al-Laith, Ali},
  note={Article Number: e0332580}
}
kops.citation.iso690EHSAN, Toqeer, Miriam BUTT, Sarmad HUSSAIN, Hassan ALHUZALI, Ali AL-LAITH, 2025. Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language. In: PLoS One. Public Library of Science (PLoS). 2025, 20(9), e0332580. eISSN 1932-6203. Verfügbar unter: doi: 10.1371/journal.pone.0332580deu
kops.citation.iso690EHSAN, Toqeer, Miriam BUTT, Sarmad HUSSAIN, Hassan ALHUZALI, Ali AL-LAITH, 2025. Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language. In: PLoS One. Public Library of Science (PLoS). 2025, 20(9), e0332580. eISSN 1932-6203. Available under: doi: 10.1371/journal.pone.0332580eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/74821">
    <dcterms:title>Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language</dcterms:title>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
    <dc:creator>Al-Laith, Ali</dc:creator>
    <dc:creator>Alhuzali, Hassan</dc:creator>
    <dc:language>eng</dc:language>
    <dc:contributor>Hussain, Sarmad</dc:contributor>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/74821"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-10-14T08:49:53Z</dc:date>
    <dcterms:abstract>We address the challenge of syntactic parsing for Urdu, a morphologically rich language, and present state-of-the-art results for both constituency and dependency parsing. This paper offers four major contributions: 1) the conversion of the CLE-UTB phrase structure treebank into a dependency treebank by developing language-specific head-word and phrase-to-dependency label mapping rules; 2) a novel sequence labeling scheme that transforms the parsing task into a unified representation; 3) the training of contextualized word representations on a large 220 million tokens Urdu corpus collected from the web; and 4) development of parsing framework using two learning paradigms, single-task and multi-task learning. Several post-processing rules are applied to improve the quality of the automatically converted dependency structure treebank. The proposed sequence labeling scheme enables the use of a shared architecture that learns the syntactic structures from both grammatical structures simultaneously and hence improves generalization. Experiments show that the multi-task learning setup significantly enhances parsing performance, achieving an F1 score of 91.39 for constituency parsing (an improvement of 3.29 points) and a labeled attachment score of 85.69 for dependency parsing (an improvement of 1.49 points). These results demonstrate that learning cross-task representations provides measurable benefits and advances the state of syntactic parsing for Urdu.</dcterms:abstract>
    <dcterms:issued>2025-09-25</dcterms:issued>
    <dc:creator>Butt, Miriam</dc:creator>
    <dc:contributor>Ehsan, Toqeer</dc:contributor>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Al-Laith, Ali</dc:contributor>
    <dc:rights>Attribution 4.0 International</dc:rights>
    <dc:creator>Ehsan, Toqeer</dc:creator>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:creator>Hussain, Sarmad</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-10-14T08:49:53Z</dcterms:available>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/74821/1/Ehsan_2-bziqya98f9l9.pdf"/>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:contributor>Alhuzali, Hassan</dc:contributor>
    <dc:contributor>Butt, Miriam</dc:contributor>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/74821/1/Ehsan_2-bziqya98f9l9.pdf"/>
  </rdf:Description>
</rdf:RDF>
kops.description.openAccessopenaccessgold
kops.flag.isPeerReviewedtrue
kops.flag.knbibliographytrue
kops.identifier.nbnurn:nbn:de:bsz:352-2-bziqya98f9l9
kops.sourcefieldPLoS One. Public Library of Science (PLoS). 2025, <b>20</b>(9), e0332580. eISSN 1932-6203. Verfügbar unter: doi: 10.1371/journal.pone.0332580deu
kops.sourcefield.plainPLoS One. Public Library of Science (PLoS). 2025, 20(9), e0332580. eISSN 1932-6203. Verfügbar unter: doi: 10.1371/journal.pone.0332580deu
kops.sourcefield.plainPLoS One. Public Library of Science (PLoS). 2025, 20(9), e0332580. eISSN 1932-6203. Available under: doi: 10.1371/journal.pone.0332580eng
relation.isAuthorOfPublication8bb66e1d-4b9c-4c7a-8ce1-b4007086d236
relation.isAuthorOfPublication.latestForDiscovery8bb66e1d-4b9c-4c7a-8ce1-b4007086d236
source.bibliographicInfo.articleNumbere0332580
source.bibliographicInfo.issue9
source.bibliographicInfo.volume20
source.identifier.eissn1932-6203
source.periodicalTitlePLoS One
source.publisherPublic Library of Science (PLoS)

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
Ehsan_2-bziqya98f9l9.pdf
Größe:
6.76 MB
Format:
Adobe Portable Document Format
Ehsan_2-bziqya98f9l9.pdf
Ehsan_2-bziqya98f9l9.pdfGröße: 6.76 MBDownloads: 86

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
license.txt
Größe:
3.96 KB
Format:
Item-specific license agreed upon to submission
Beschreibung:
license.txt
license.txtGröße: 3.96 KBDownloads: 0