Publikation: Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
DOI (zitierfähiger Link)
Internationale Patentnummer
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
Dataset exploration is a set of techniques crucial in many research and data science projects. For textual datasets, commonly used techniques include topic modeling, document summarization, and methods related to dimension reduction. Despite their robustness, these techniques suffer from at least one of the following drawbacks: document summarization does not explicitly set documents in relation, the others yield summaries or topics that often are difficult to interpret and yield poor results for topics that consist of context-dependent terms. We propose a method for dataset exploration that employs cross-document near-identity resolution of mentions of semantic concepts, such as persons, other named entity types, events, actions. The method not only sets documents in relation and thus allows for comparative dataset exploration, but also yields well interpretable document representations. Additionally, due to the underlying approach for cross-document resolution of concept mentions, the method is able to set documents in relation as to their near-identity terms, e.g., synonyms that are not universally valid but only in the given dataset.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
ZHUKOVA, Anastasia, Felix HAMBORG, Bela GIPP, 2020. Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations. JCDL '20. China (Virtual Event), 1. Aug. 2020 - 5. Aug. 2020. In: HUANG, Ruhua, ed. and others. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20). New York: ACM, 2020, pp. 457-458. ISBN 978-1-4503-7585-6. Available under: doi: 10.1145/3383583.3398562BibTex
@inproceedings{Zhukova2020Inter-51923, year={2020}, doi={10.1145/3383583.3398562}, title={Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations}, isbn={978-1-4503-7585-6}, publisher={ACM}, address={New York}, booktitle={Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20)}, pages={457--458}, editor={Huang, Ruhua}, author={Zhukova, Anastasia and Hamborg, Felix and Gipp, Bela} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/51923"> <dc:creator>Hamborg, Felix</dc:creator> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <dc:rights>terms-of-use</dc:rights> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:contributor>Zhukova, Anastasia</dc:contributor> <dc:contributor>Hamborg, Felix</dc:contributor> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-11-25T14:24:20Z</dcterms:available> <dc:creator>Gipp, Bela</dc:creator> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:contributor>Gipp, Bela</dc:contributor> <dc:creator>Zhukova, Anastasia</dc:creator> <dcterms:issued>2020</dcterms:issued> <dcterms:title>Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations</dcterms:title> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/51923"/> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dcterms:abstract xml:lang="eng">Dataset exploration is a set of techniques crucial in many research and data science projects. For textual datasets, commonly used techniques include topic modeling, document summarization, and methods related to dimension reduction. Despite their robustness, these techniques suffer from at least one of the following drawbacks: document summarization does not explicitly set documents in relation, the others yield summaries or topics that often are difficult to interpret and yield poor results for topics that consist of context-dependent terms. We propose a method for dataset exploration that employs cross-document near-identity resolution of mentions of semantic concepts, such as persons, other named entity types, events, actions. The method not only sets documents in relation and thus allows for comparative dataset exploration, but also yields well interpretable document representations. Additionally, due to the underlying approach for cross-document resolution of concept mentions, the method is able to set documents in relation as to their near-identity terms, e.g., synonyms that are not universally valid but only in the given dataset.</dcterms:abstract> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-11-25T14:24:20Z</dc:date> <dc:language>eng</dc:language> </rdf:Description> </rdf:RDF>