Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations

dc.contributor.authorZhukova, Anastasia
dc.contributor.authorHamborg, Felix
dc.contributor.authorGipp, Bela
dc.date.accessioned2020-11-25T14:24:20Z
dc.date.available2020-11-25T14:24:20Z
dc.date.issued2020eng
dc.description.abstractDataset exploration is a set of techniques crucial in many research and data science projects. For textual datasets, commonly used techniques include topic modeling, document summarization, and methods related to dimension reduction. Despite their robustness, these techniques suffer from at least one of the following drawbacks: document summarization does not explicitly set documents in relation, the others yield summaries or topics that often are difficult to interpret and yield poor results for topics that consist of context-dependent terms. We propose a method for dataset exploration that employs cross-document near-identity resolution of mentions of semantic concepts, such as persons, other named entity types, events, actions. The method not only sets documents in relation and thus allows for comparative dataset exploration, but also yields well interpretable document representations. Additionally, due to the underlying approach for cross-document resolution of concept mentions, the method is able to set documents in relation as to their near-identity terms, e.g., synonyms that are not universally valid but only in the given dataset.eng
dc.description.versionpublishedeng
dc.identifier.doi10.1145/3383583.3398562eng
dc.identifier.urihttps://kops.uni-konstanz.de/handle/123456789/51923
dc.language.isoengeng
dc.rightsterms-of-use
dc.rights.urihttps://rightsstatements.org/page/InC/1.0/
dc.subject.ddc004eng
dc.titleInterpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relationseng
dc.typeINPROCEEDINGSeng
dspace.entity.typePublication
kops.citation.bibtex
@inproceedings{Zhukova2020Inter-51923,
  year={2020},
  doi={10.1145/3383583.3398562},
  title={Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations},
  isbn={978-1-4503-7585-6},
  publisher={ACM},
  address={New York},
  booktitle={Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20)},
  pages={457--458},
  editor={Huang, Ruhua},
  author={Zhukova, Anastasia and Hamborg, Felix and Gipp, Bela}
}
kops.citation.iso690ZHUKOVA, Anastasia, Felix HAMBORG, Bela GIPP, 2020. Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations. JCDL '20. China (Virtual Event), 1. Aug. 2020 - 5. Aug. 2020. In: HUANG, Ruhua, ed. and others. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20). New York: ACM, 2020, pp. 457-458. ISBN 978-1-4503-7585-6. Available under: doi: 10.1145/3383583.3398562deu
kops.citation.iso690ZHUKOVA, Anastasia, Felix HAMBORG, Bela GIPP, 2020. Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations. JCDL '20. China (Virtual Event), Aug 1, 2020 - Aug 5, 2020. In: HUANG, Ruhua, ed. and others. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20). New York: ACM, 2020, pp. 457-458. ISBN 978-1-4503-7585-6. Available under: doi: 10.1145/3383583.3398562eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/51923">
    <dc:creator>Hamborg, Felix</dc:creator>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:rights>terms-of-use</dc:rights>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:contributor>Zhukova, Anastasia</dc:contributor>
    <dc:contributor>Hamborg, Felix</dc:contributor>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-11-25T14:24:20Z</dcterms:available>
    <dc:creator>Gipp, Bela</dc:creator>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:contributor>Gipp, Bela</dc:contributor>
    <dc:creator>Zhukova, Anastasia</dc:creator>
    <dcterms:issued>2020</dcterms:issued>
    <dcterms:title>Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations</dcterms:title>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/51923"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:abstract xml:lang="eng">Dataset exploration is a set of techniques crucial in many research and data science projects. For textual datasets, commonly used techniques include topic modeling, document summarization, and methods related to dimension reduction. Despite their robustness, these techniques suffer from at least one of the following drawbacks: document summarization does not explicitly set documents in relation, the others yield summaries or topics that often are difficult to interpret and yield poor results for topics that consist of context-dependent terms. We propose a method for dataset exploration that employs cross-document near-identity resolution of mentions of semantic concepts, such as persons, other named entity types, events, actions. The method not only sets documents in relation and thus allows for comparative dataset exploration, but also yields well interpretable document representations. Additionally, due to the underlying approach for cross-document resolution of concept mentions, the method is able to set documents in relation as to their near-identity terms, e.g., synonyms that are not universally valid but only in the given dataset.</dcterms:abstract>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-11-25T14:24:20Z</dc:date>
    <dc:language>eng</dc:language>
  </rdf:Description>
</rdf:RDF>
kops.conferencefieldJCDL '20, 1. Aug. 2020 - 5. Aug. 2020, China (Virtual Event)deu
kops.date.conferenceEnd2020-08-05eng
kops.date.conferenceStart2020-08-01eng
kops.flag.knbibliographyfalse
kops.location.conferenceChina (Virtual Event)eng
kops.sourcefieldHUANG, Ruhua, ed. and others. <i>Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20)</i>. New York: ACM, 2020, pp. 457-458. ISBN 978-1-4503-7585-6. Available under: doi: 10.1145/3383583.3398562deu
kops.sourcefield.plainHUANG, Ruhua, ed. and others. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20). New York: ACM, 2020, pp. 457-458. ISBN 978-1-4503-7585-6. Available under: doi: 10.1145/3383583.3398562deu
kops.sourcefield.plainHUANG, Ruhua, ed. and others. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20). New York: ACM, 2020, pp. 457-458. ISBN 978-1-4503-7585-6. Available under: doi: 10.1145/3383583.3398562eng
kops.title.conferenceJCDL '20eng
relation.isAuthorOfPublication3ce0130b-52fe-4aea-ab76-5c8e70ca0e39
relation.isAuthorOfPublicationb3446266-8107-4eb9-9221-75e81cca13df
relation.isAuthorOfPublication358ad52f-dab7-4582-bf8e-8adcf477a2d4
relation.isAuthorOfPublication.latestForDiscovery3ce0130b-52fe-4aea-ab76-5c8e70ca0e39
source.bibliographicInfo.fromPage457eng
source.bibliographicInfo.toPage458eng
source.contributor.editorHuang, Ruhua
source.flag.etalEditortrueeng
source.identifier.isbn978-1-4503-7585-6eng
source.publisherACMeng
source.publisher.locationNew Yorkeng
source.titleProceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20)eng

Dateien