Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language
| dc.contributor.author | Ehsan, Toqeer | |
| dc.contributor.author | Butt, Miriam | |
| dc.contributor.author | Hussain, Sarmad | |
| dc.contributor.author | Alhuzali, Hassan | |
| dc.contributor.author | Al-Laith, Ali | |
| dc.date.accessioned | 2025-10-14T08:49:53Z | |
| dc.date.available | 2025-10-14T08:49:53Z | |
| dc.date.issued | 2025-09-25 | |
| dc.description.abstract | We address the challenge of syntactic parsing for Urdu, a morphologically rich language, and present state-of-the-art results for both constituency and dependency parsing. This paper offers four major contributions: 1) the conversion of the CLE-UTB phrase structure treebank into a dependency treebank by developing language-specific head-word and phrase-to-dependency label mapping rules; 2) a novel sequence labeling scheme that transforms the parsing task into a unified representation; 3) the training of contextualized word representations on a large 220 million tokens Urdu corpus collected from the web; and 4) development of parsing framework using two learning paradigms, single-task and multi-task learning. Several post-processing rules are applied to improve the quality of the automatically converted dependency structure treebank. The proposed sequence labeling scheme enables the use of a shared architecture that learns the syntactic structures from both grammatical structures simultaneously and hence improves generalization. Experiments show that the multi-task learning setup significantly enhances parsing performance, achieving an F1 score of 91.39 for constituency parsing (an improvement of 3.29 points) and a labeled attachment score of 85.69 for dependency parsing (an improvement of 1.49 points). These results demonstrate that learning cross-task representations provides measurable benefits and advances the state of syntactic parsing for Urdu. | |
| dc.description.version | published | deu |
| dc.identifier.doi | 10.1371/journal.pone.0332580 | |
| dc.identifier.ppn | 1938347919 | |
| dc.identifier.uri | https://kops.uni-konstanz.de/handle/123456789/74821 | |
| dc.language.iso | eng | |
| dc.rights | Attribution 4.0 International | |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
| dc.subject.ddc | 400 | |
| dc.title | Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language | eng |
| dc.type | JOURNAL_ARTICLE | |
| dspace.entity.type | Publication | |
| kops.citation.bibtex | @article{Ehsan2025-09-25Multi-74821,
title={Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language},
year={2025},
doi={10.1371/journal.pone.0332580},
number={9},
volume={20},
journal={PLoS One},
author={Ehsan, Toqeer and Butt, Miriam and Hussain, Sarmad and Alhuzali, Hassan and Al-Laith, Ali},
note={Article Number: e0332580}
} | |
| kops.citation.iso690 | EHSAN, Toqeer, Miriam BUTT, Sarmad HUSSAIN, Hassan ALHUZALI, Ali AL-LAITH, 2025. Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language. In: PLoS One. Public Library of Science (PLoS). 2025, 20(9), e0332580. eISSN 1932-6203. Verfügbar unter: doi: 10.1371/journal.pone.0332580 | deu |
| kops.citation.iso690 | EHSAN, Toqeer, Miriam BUTT, Sarmad HUSSAIN, Hassan ALHUZALI, Ali AL-LAITH, 2025. Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language. In: PLoS One. Public Library of Science (PLoS). 2025, 20(9), e0332580. eISSN 1932-6203. Available under: doi: 10.1371/journal.pone.0332580 | eng |
| kops.citation.rdf | <rdf:RDF
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:bibo="http://purl.org/ontology/bibo/"
xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:void="http://rdfs.org/ns/void#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#" >
<rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/74821">
<dcterms:title>Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language</dcterms:title>
<dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
<dc:creator>Al-Laith, Ali</dc:creator>
<dc:creator>Alhuzali, Hassan</dc:creator>
<dc:language>eng</dc:language>
<dc:contributor>Hussain, Sarmad</dc:contributor>
<bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/74821"/>
<dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-10-14T08:49:53Z</dc:date>
<dcterms:abstract>We address the challenge of syntactic parsing for Urdu, a morphologically rich language, and present state-of-the-art results for both constituency and dependency parsing. This paper offers four major contributions: 1) the conversion of the CLE-UTB phrase structure treebank into a dependency treebank by developing language-specific head-word and phrase-to-dependency label mapping rules; 2) a novel sequence labeling scheme that transforms the parsing task into a unified representation; 3) the training of contextualized word representations on a large 220 million tokens Urdu corpus collected from the web; and 4) development of parsing framework using two learning paradigms, single-task and multi-task learning. Several post-processing rules are applied to improve the quality of the automatically converted dependency structure treebank. The proposed sequence labeling scheme enables the use of a shared architecture that learns the syntactic structures from both grammatical structures simultaneously and hence improves generalization. Experiments show that the multi-task learning setup significantly enhances parsing performance, achieving an F1 score of 91.39 for constituency parsing (an improvement of 3.29 points) and a labeled attachment score of 85.69 for dependency parsing (an improvement of 1.49 points). These results demonstrate that learning cross-task representations provides measurable benefits and advances the state of syntactic parsing for Urdu.</dcterms:abstract>
<dcterms:issued>2025-09-25</dcterms:issued>
<dc:creator>Butt, Miriam</dc:creator>
<dc:contributor>Ehsan, Toqeer</dc:contributor>
<dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
<void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
<dc:contributor>Al-Laith, Ali</dc:contributor>
<dc:rights>Attribution 4.0 International</dc:rights>
<dc:creator>Ehsan, Toqeer</dc:creator>
<dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
<dc:creator>Hussain, Sarmad</dc:creator>
<dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-10-14T08:49:53Z</dcterms:available>
<dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/74821/1/Ehsan_2-bziqya98f9l9.pdf"/>
<foaf:homepage rdf:resource="http://localhost:8080/"/>
<dc:contributor>Alhuzali, Hassan</dc:contributor>
<dc:contributor>Butt, Miriam</dc:contributor>
<dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/74821/1/Ehsan_2-bziqya98f9l9.pdf"/>
</rdf:Description>
</rdf:RDF> | |
| kops.description.openAccess | openaccessgold | |
| kops.flag.isPeerReviewed | true | |
| kops.flag.knbibliography | true | |
| kops.identifier.nbn | urn:nbn:de:bsz:352-2-bziqya98f9l9 | |
| kops.sourcefield | PLoS One. Public Library of Science (PLoS). 2025, <b>20</b>(9), e0332580. eISSN 1932-6203. Verfügbar unter: doi: 10.1371/journal.pone.0332580 | deu |
| kops.sourcefield.plain | PLoS One. Public Library of Science (PLoS). 2025, 20(9), e0332580. eISSN 1932-6203. Verfügbar unter: doi: 10.1371/journal.pone.0332580 | deu |
| kops.sourcefield.plain | PLoS One. Public Library of Science (PLoS). 2025, 20(9), e0332580. eISSN 1932-6203. Available under: doi: 10.1371/journal.pone.0332580 | eng |
| relation.isAuthorOfPublication | 8bb66e1d-4b9c-4c7a-8ce1-b4007086d236 | |
| relation.isAuthorOfPublication.latestForDiscovery | 8bb66e1d-4b9c-4c7a-8ce1-b4007086d236 | |
| source.bibliographicInfo.articleNumber | e0332580 | |
| source.bibliographicInfo.issue | 9 | |
| source.bibliographicInfo.volume | 20 | |
| source.identifier.eissn | 1932-6203 | |
| source.periodicalTitle | PLoS One | |
| source.publisher | Public Library of Science (PLoS) |
Dateien
Originalbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
- Name:
- Ehsan_2-bziqya98f9l9.pdf
- Größe:
- 6.76 MB
- Format:
- Adobe Portable Document Format
Lizenzbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
- Name:
- license.txt
- Größe:
- 3.96 KB
- Format:
- Item-specific license agreed upon to submission
- Beschreibung:

