Publikation:

ThamizhiMorph : A morphological parser for the Tamil language

Lade...
Vorschaubild

Dateien

Sarveswaran_2-qjakvmdmgv940.pdf
Sarveswaran_2-qjakvmdmgv940.pdfGröße: 3.72 MBDownloads: 419

Datum

2021

Autor:innen

Sarveswaran, Kengatharaiyer
Dias, Gihan

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

ArXiv-ID

Internationale Patentnummer

Link zur Lizenz

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Open Access Hybrid
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Zeitschriftenartikel
Publikationsstatus
Published

Erschienen in

Machine Translation. Springer. 2021, 35(1), pp. 37-70. ISSN 0922-6567. eISSN 1573-0573. Available under: doi: 10.1007/s10590-021-09261-5

Zusammenfassung

This paper presents an open source and extendable Morphological Analyser cum Generator (MAG) for Tamil named ThamizhiMorph. Tamil is a low-resource language in terms of NLP processing tools and applications. In addition, most of the available tools are neither open nor extendable. A morphological analyser is a key resource for the storage and retrieval of morphophonological and morphosyntactic information, especially for morphologically rich languages, and is also useful for developing applications within Machine Translation. This paper describes how ThamizhiMorph is designed using a Finite-State Transducer (FST) and implemented using Foma. We discuss our design decisions based on the peculiarities of Tamil and its nominal and verbal paradigms. We specify a high-level meta-language to efficiently characterise the language’s inflectional morphology. We evaluate ThamizhiMorph using text from a Tamil textbook and the Tamil Universal Dependency treebank version 2.5. The evaluation and error analysis attest a very high performance level, with the identified errors being mostly due to out-of-vocabulary items, which are easily fixable. In order to foster further development, we have made our scripts, the FST models, lexicons, Meta-Morphological rules, lists of generated verbs and nouns, and test data sets freely available for others to use and extend upon.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
400 Sprachwissenschaft, Linguistik

Schlagwörter

Morphological analyser, Morphological generator, Finite-State transducer, Tamil language, Low-resource language, Morphologically rich language

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690SARVESWARAN, Kengatharaiyer, Gihan DIAS, Miriam BUTT, 2021. ThamizhiMorph : A morphological parser for the Tamil language. In: Machine Translation. Springer. 2021, 35(1), pp. 37-70. ISSN 0922-6567. eISSN 1573-0573. Available under: doi: 10.1007/s10590-021-09261-5
BibTex
@article{Sarveswaran2021-04-23Thami-53771,
  year={2021},
  doi={10.1007/s10590-021-09261-5},
  title={ThamizhiMorph : A morphological parser for the Tamil language},
  number={1},
  volume={35},
  issn={0922-6567},
  journal={Machine Translation},
  pages={37--70},
  author={Sarveswaran, Kengatharaiyer and Dias, Gihan and Butt, Miriam}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/53771">
    <dc:contributor>Butt, Miriam</dc:contributor>
    <dc:creator>Sarveswaran, Kengatharaiyer</dc:creator>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-05-27T13:24:36Z</dcterms:available>
    <dcterms:issued>2021-04-23</dcterms:issued>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/53771/1/Sarveswaran_2-qjakvmdmgv940.pdf"/>
    <dc:rights>Attribution 4.0 International</dc:rights>
    <dcterms:title>ThamizhiMorph : A morphological parser for the Tamil language</dcterms:title>
    <dc:contributor>Dias, Gihan</dc:contributor>
    <dc:creator>Butt, Miriam</dc:creator>
    <dc:creator>Dias, Gihan</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:contributor>Sarveswaran, Kengatharaiyer</dc:contributor>
    <dcterms:abstract xml:lang="eng">This paper presents an open source and extendable Morphological Analyser cum Generator (MAG) for Tamil named ThamizhiMorph. Tamil is a low-resource language in terms of NLP processing tools and applications. In addition, most of the available tools are neither open nor extendable. A morphological analyser is a key resource for the storage and retrieval of morphophonological and morphosyntactic information, especially for morphologically rich languages, and is also useful for developing applications within Machine Translation. This paper describes how ThamizhiMorph is designed using a Finite-State Transducer (FST) and implemented using Foma. We discuss our design decisions based on the peculiarities of Tamil and its nominal and verbal paradigms. We specify a high-level meta-language to efficiently characterise the language’s inflectional morphology. We evaluate ThamizhiMorph using text from a Tamil textbook and the Tamil Universal Dependency treebank version 2.5. The evaluation and error analysis attest a very high performance level, with the identified errors being mostly due to out-of-vocabulary items, which are easily fixable. In order to foster further development, we have made our scripts, the FST models, lexicons, Meta-Morphological rules, lists of generated verbs and nouns, and test data sets freely available for others to use and extend upon.</dcterms:abstract>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/53771"/>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/53771/1/Sarveswaran_2-qjakvmdmgv940.pdf"/>
    <dc:language>eng</dc:language>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-05-27T13:24:36Z</dc:date>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Ja
Diese Publikation teilen