ThamizhiFST : A Morphological Analyser and Generator for Tamil Verbs

dc.contributor.authorSarveswaran, Kengatharaiyer
dc.contributor.authorDias, Gihan
dc.contributor.authorButt, Miriam
dc.date.accessioned2020-08-19T11:16:46Z
dc.date.available2020-08-19T11:16:46Z
dc.date.issued2018-12eng
dc.description.abstractThamizhiFST is a Morphological Analyser and Generator (MAG) for Tamil. It was developed to extend the coverage of the computational Tamil grammar being developed using Lexical Functional Grammar (LFG). ThamizhiFST covers the simple verbs in Tamil as an initial step. A Finite State Transducer (FST) approach was used to develop the MAG and it was implemented using the FOMA Open Source Software. Since morphological rules are of a finite nature and represent a known quantity, a rule-based approach like FST is more appropriate than possible machine learning alternatives, especially with respect to achieving reliably good accuracy that is required for computational grammar development. A set of 3250 Tamil verb lemmas from 13 paradigms together with their 260 conjugation forms were used in the construction of ThamizhiFST. Further, a set of 27 labels were used to mark the morphosyntactic information of the verbs. The whole system was developed as a three-layer web-based system to tackle the issues arising when processing an agglutinative language like Tamil and to ensure its extendability. Unlike other existing MAGs, ThamizhiFST also provides the morpheme corresponding to each morphosyntactic label and marks morpheme boundaries. An evaluation shows that ThamizhiFST has an f-measure of 0.97 for simple verbs. Future and current work include work on extending the system to cover more verbs and nouns and make it generally available.eng
dc.description.versionpublishedde
dc.identifier.doi10.1109/ICITR.2018.8736139eng
dc.identifier.ppn1736472216
dc.identifier.urihttps://kops.uni-konstanz.de/handle/123456789/50531
dc.language.isoengeng
dc.rightsterms-of-use
dc.rights.urihttps://rightsstatements.org/page/InC/1.0/
dc.subject.ddc400eng
dc.titleThamizhiFST : A Morphological Analyser and Generator for Tamil Verbseng
dc.typeINPROCEEDINGSde
dspace.entity.typePublication
kops.citation.bibtex
@inproceedings{Sarveswaran2018-12Thami-50531,
  year={2018},
  doi={10.1109/ICITR.2018.8736139},
  title={ThamizhiFST : A Morphological Analyser and Generator for Tamil Verbs},
  isbn={978-1-72811-470-5},
  publisher={IEEE},
  publisher={IEEE},
  address={Piscataway, NJ},
  booktitle={2018 3rd International Conference on Information Technology Research (ICITR)},
  author={Sarveswaran, Kengatharaiyer and Dias, Gihan and Butt, Miriam}
}
kops.citation.iso690SARVESWARAN, Kengatharaiyer, Gihan DIAS, Miriam BUTT, 2018. ThamizhiFST : A Morphological Analyser and Generator for Tamil Verbs. 2018 3rd International Conference on Information Technology Research (ICITR). Moratuwa, Sri Lanka, 5. Dez. 2018 - 7. Dez. 2018. In: 2018 3rd International Conference on Information Technology Research (ICITR). Piscataway, NJ: IEEE, 2018. ISBN 978-1-72811-470-5. Available under: doi: 10.1109/ICITR.2018.8736139deu
kops.citation.iso690SARVESWARAN, Kengatharaiyer, Gihan DIAS, Miriam BUTT, 2018. ThamizhiFST : A Morphological Analyser and Generator for Tamil Verbs. 2018 3rd International Conference on Information Technology Research (ICITR). Moratuwa, Sri Lanka, Dec 5, 2018 - Dec 7, 2018. In: 2018 3rd International Conference on Information Technology Research (ICITR). Piscataway, NJ: IEEE, 2018. ISBN 978-1-72811-470-5. Available under: doi: 10.1109/ICITR.2018.8736139eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/50531">
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-08-19T11:16:46Z</dcterms:available>
    <dc:creator>Sarveswaran, Kengatharaiyer</dc:creator>
    <dc:creator>Dias, Gihan</dc:creator>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dcterms:issued>2018-12</dcterms:issued>
    <dcterms:abstract xml:lang="eng">ThamizhiFST is a Morphological Analyser and Generator (MAG) for Tamil. It was developed to extend the coverage of the computational Tamil grammar being developed using Lexical Functional Grammar (LFG). ThamizhiFST covers the simple verbs in Tamil as an initial step. A Finite State Transducer (FST) approach was used to develop the MAG and it was implemented using the FOMA Open Source Software. Since morphological rules are of a finite nature and represent a known quantity, a rule-based approach like FST is more appropriate than possible machine learning alternatives, especially with respect to achieving reliably good accuracy that is required for computational grammar development. A set of 3250 Tamil verb lemmas from 13 paradigms together with their 260 conjugation forms were used in the construction of ThamizhiFST. Further, a set of 27 labels were used to mark the morphosyntactic information of the verbs. The whole system was developed as a three-layer web-based system to tackle the issues arising when processing an agglutinative language like Tamil and to ensure its extendability. Unlike other existing MAGs, ThamizhiFST also provides the morpheme corresponding to each morphosyntactic label and marks morpheme boundaries. An evaluation shows that ThamizhiFST has an f-measure of 0.97 for simple verbs. Future and current work include work on extending the system to cover more verbs and nouns and make it generally available.</dcterms:abstract>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:contributor>Sarveswaran, Kengatharaiyer</dc:contributor>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-08-19T11:16:46Z</dc:date>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/50531/1/Sarveswaran_2-3mko66sqho4f6.pdf"/>
    <dcterms:title>ThamizhiFST : A Morphological Analyser and Generator for Tamil Verbs</dcterms:title>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/50531/1/Sarveswaran_2-3mko66sqho4f6.pdf"/>
    <dc:contributor>Butt, Miriam</dc:contributor>
    <dc:rights>terms-of-use</dc:rights>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/50531"/>
    <dc:contributor>Dias, Gihan</dc:contributor>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:creator>Butt, Miriam</dc:creator>
    <dc:language>eng</dc:language>
  </rdf:Description>
</rdf:RDF>
kops.conferencefield2018 3rd International Conference on Information Technology Research (ICITR), 5. Dez. 2018 - 7. Dez. 2018, Moratuwa, Sri Lankadeu
kops.date.conferenceEnd2018-12-07eng
kops.date.conferenceStart2018-12-05eng
kops.description.openAccessopenaccessgreen
kops.flag.knbibliographytrue
kops.identifier.nbnurn:nbn:de:bsz:352-2-3mko66sqho4f6
kops.location.conferenceMoratuwa, Sri Lankaeng
kops.sourcefield<i>2018 3rd International Conference on Information Technology Research (ICITR)</i>. Piscataway, NJ: IEEE, 2018. ISBN 978-1-72811-470-5. Available under: doi: 10.1109/ICITR.2018.8736139deu
kops.sourcefield.plain2018 3rd International Conference on Information Technology Research (ICITR). Piscataway, NJ: IEEE, 2018. ISBN 978-1-72811-470-5. Available under: doi: 10.1109/ICITR.2018.8736139deu
kops.sourcefield.plain2018 3rd International Conference on Information Technology Research (ICITR). Piscataway, NJ: IEEE, 2018. ISBN 978-1-72811-470-5. Available under: doi: 10.1109/ICITR.2018.8736139eng
kops.title.conference2018 3rd International Conference on Information Technology Research (ICITR)eng
relation.isAuthorOfPublication8bb66e1d-4b9c-4c7a-8ce1-b4007086d236
relation.isAuthorOfPublication.latestForDiscovery8bb66e1d-4b9c-4c7a-8ce1-b4007086d236
source.identifier.isbn978-1-72811-470-5eng
source.publisherIEEEeng
source.publisherIEEE
source.publisher.locationPiscataway, NJ
source.title2018 3rd International Conference on Information Technology Research (ICITR)eng

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
Sarveswaran_2-3mko66sqho4f6.pdf
Größe:
149.73 KB
Format:
Adobe Portable Document Format
Beschreibung:
Sarveswaran_2-3mko66sqho4f6.pdf
Sarveswaran_2-3mko66sqho4f6.pdfGröße: 149.73 KBDownloads: 894