ThamizhiMorph : A morphological parser for the Tamil language

Cite This

Files in this item

Checksum: MD5:66b8df5a88229d86b8efbd4a26858e93

SARVESWARAN, Kengatharaiyer, Gihan DIAS, Miriam BUTT, 2021. ThamizhiMorph : A morphological parser for the Tamil language. In: Machine Translation. Springer. 35(1), pp. 37-70. ISSN 0922-6567. eISSN 1573-0573. Available under: doi: 10.1007/s10590-021-09261-5

@article{Sarveswaran2021-04-23Thami-53771, title={ThamizhiMorph : A morphological parser for the Tamil language}, year={2021}, doi={10.1007/s10590-021-09261-5}, number={1}, volume={35}, issn={0922-6567}, journal={Machine Translation}, pages={37--70}, author={Sarveswaran, Kengatharaiyer and Dias, Gihan and Butt, Miriam} }

<rdf:RDF xmlns:dcterms="" xmlns:dc="" xmlns:rdf="" xmlns:bibo="" xmlns:dspace="" xmlns:foaf="" xmlns:void="" xmlns:xsd="" > <rdf:Description rdf:about=""> <dcterms:available rdf:datatype="">2021-05-27T13:24:36Z</dcterms:available> <dc:contributor>Dias, Gihan</dc:contributor> <dcterms:title>ThamizhiMorph : A morphological parser for the Tamil language</dcterms:title> <dc:rights>Attribution 4.0 International</dc:rights> <dc:language>eng</dc:language> <dc:contributor>Butt, Miriam</dc:contributor> <dcterms:isPartOf rdf:resource=""/> <dspace:isPartOfCollection rdf:resource=""/> <dspace:hasBitstream rdf:resource=""/> <dc:date rdf:datatype="">2021-05-27T13:24:36Z</dc:date> <bibo:uri rdf:resource=""/> <dc:contributor>Sarveswaran, Kengatharaiyer</dc:contributor> <dc:creator>Butt, Miriam</dc:creator> <dcterms:rights rdf:resource=""/> <dcterms:abstract xml:lang="eng">This paper presents an open source and extendable Morphological Analyser cum Generator (MAG) for Tamil named ThamizhiMorph. Tamil is a low-resource language in terms of NLP processing tools and applications. In addition, most of the available tools are neither open nor extendable. A morphological analyser is a key resource for the storage and retrieval of morphophonological and morphosyntactic information, especially for morphologically rich languages, and is also useful for developing applications within Machine Translation. This paper describes how ThamizhiMorph is designed using a Finite-State Transducer (FST) and implemented using Foma. We discuss our design decisions based on the peculiarities of Tamil and its nominal and verbal paradigms. We specify a high-level meta-language to efficiently characterise the language’s inflectional morphology. We evaluate ThamizhiMorph using text from a Tamil textbook and the Tamil Universal Dependency treebank version 2.5. The evaluation and error analysis attest a very high performance level, with the identified errors being mostly due to out-of-vocabulary items, which are easily fixable. In order to foster further development, we have made our scripts, the FST models, lexicons, Meta-Morphological rules, lists of generated verbs and nouns, and test data sets freely available for others to use and extend upon.</dcterms:abstract> <foaf:homepage rdf:resource="http://localhost:8080/jspui"/> <dc:creator>Dias, Gihan</dc:creator> <dc:creator>Sarveswaran, Kengatharaiyer</dc:creator> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dcterms:issued>2021-04-23</dcterms:issued> <dcterms:hasPart rdf:resource=""/> </rdf:Description> </rdf:RDF>

Downloads since May 27, 2021 (Information about access statistics)

Sarveswaran_2-qjakvmdmgv940.pdf 297

This item appears in the following Collection(s)

Attribution 4.0 International Except where otherwise noted, this item's license is described as Attribution 4.0 International

Search KOPS


My Account