Publikation: Representation Problems in Linguistic Annotations : Ambiguity, Variation, Uncertainty, Error and Bias
Dateien
Datum
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
URI (zitierfähiger Link)
Internationale Patentnummer
Link zur Lizenz
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
The development of linguistic corpora is fraught with various problems of annotation and representation. These constitute a very real challenge for the development and use of annotated corpora, but as yet not much literature exists on how to address the underlying problems. In this paper, we identify and discuss five sources of representation problems, which are independent though interrelated: ambiguity, variation, uncertainty, error and bias. We outline and characterize these sources, discussing how their improper treatment can have stark consequences for research outcomes. Finally, we discuss how an adequate treatment can inform corpus-related linguistic research, both computational and theoretical, improving the reliability of research results and NLP models, as well as informing the more general reproducibility issue.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
SCHÄTZLE, Christin, Hannah BOOTH, Mennatallah EL-ASSADY, Miriam BUTT, 2020. Representation Problems in Linguistic Annotations : Ambiguity, Variation, Uncertainty, Error and Bias. 14th Linguistic Annotation Workshop. Barcelona, 12. Dez. 2020. In: DIPPER, Stefanie, ed., Amir ZELDERS, ed.. Proceedings of the 14th Linguistic Annotation Workshop. Stroudsburg, PA: ACL, 2020, pp. 60-73. ISBN 978-1-952148-33-0BibTex
@inproceedings{Schatzle2020Repre-52954, year={2020}, title={Representation Problems in Linguistic Annotations : Ambiguity, Variation, Uncertainty, Error and Bias}, url={https://www.aclweb.org/anthology/2020.law-1.6/}, isbn={978-1-952148-33-0}, publisher={ACL}, address={Stroudsburg, PA}, booktitle={Proceedings of the 14th Linguistic Annotation Workshop}, pages={60--73}, editor={Dipper, Stefanie and Zelders, Amir}, author={Schätzle, Christin and Booth, Hannah and El-Assady, Mennatallah and Butt, Miriam} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/52954"> <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/> <dc:creator>Schätzle, Christin</dc:creator> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-02-22T14:19:30Z</dcterms:available> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/52954"/> <dc:contributor>Booth, Hannah</dc:contributor> <dcterms:issued>2020</dcterms:issued> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dcterms:title>Representation Problems in Linguistic Annotations : Ambiguity, Variation, Uncertainty, Error and Bias</dcterms:title> <dc:creator>Butt, Miriam</dc:creator> <dc:contributor>Schätzle, Christin</dc:contributor> <dc:creator>Booth, Hannah</dc:creator> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:creator>El-Assady, Mennatallah</dc:creator> <dcterms:abstract xml:lang="eng">The development of linguistic corpora is fraught with various problems of annotation and representation. These constitute a very real challenge for the development and use of annotated corpora, but as yet not much literature exists on how to address the underlying problems. In this paper, we identify and discuss five sources of representation problems, which are independent though interrelated: ambiguity, variation, uncertainty, error and bias. We outline and characterize these sources, discussing how their improper treatment can have stark consequences for research outcomes. Finally, we discuss how an adequate treatment can inform corpus-related linguistic research, both computational and theoretical, improving the reliability of research results and NLP models, as well as informing the more general reproducibility issue.</dcterms:abstract> <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/52954/1/Beck_2-y9m6blefbkx03.pdf"/> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/> <dc:contributor>Butt, Miriam</dc:contributor> <dc:language>eng</dc:language> <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/52954/1/Beck_2-y9m6blefbkx03.pdf"/> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:rights>Attribution 4.0 International</dc:rights> <dc:contributor>El-Assady, Mennatallah</dc:contributor> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-02-22T14:19:30Z</dc:date> </rdf:Description> </rdf:RDF>