Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution
No Thumbnail Available
Files
There are no files associated with this item.
Date
2020
Editors
Journal ISSN
Electronic ISSN
ISBN
Bibliographical data
Publisher
Series
DOI (citable link)
International patent number
Link to the license
EU project number
Project
Open Access publication
Collections
Title in another language
Publication type
Contribution to a conference collection
Publication status
Published
Published in
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20) / Huang, Ruhua et al. (ed.). - New York : ACM, 2020. - pp. 461-462. - ISBN 978-1-4503-7585-6
Abstract
Topic modeling is a technique used in a broad spectrum of use cases, such as data exploration, summarization, and classification. Despite being a crucial constituent of many use cases, established topic models, such as LDA, often produce statistically valid yet non-meaningful topics, i.e., that cannot easily be interpreted by humans. In turn, the usability of topic modeling approaches, e.g., in document summarization, is non-optimal. We propose a topic modeling approach that uses TCA, a method for also near-identity cross-document coreference resolution. TCA showed promising results when resolving mentions of not only persons and other named entities, but also broad, vague, or abstract concepts. In a preliminary evaluation on news articles, we compare the approach with state-of-the-art topic modeling. We find that (1) the four baselines produce statistically valid yet hollow topics or topics that only refer to events in the dataset but not the events' topical composition. (2) TCA is the only approach that extracts topics that distinctively describe meaningful parts of the dataset.
Summary in another language
Subject (DDC)
004 Computer Science
Keywords
Conference
JCDL '20, Aug 1, 2020 - Aug 5, 2020, China (Virtual Event)
Review
undefined / . - undefined, undefined. - (undefined; undefined)
Cite This
ISO 690
ZHUKOVA, Anastasia, Felix HAMBORG, Bela GIPP, 2020. Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution. JCDL '20. China (Virtual Event), Aug 1, 2020 - Aug 5, 2020. In: HUANG, Ruhua, ed. and others. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20). New York:ACM, pp. 461-462. ISBN 978-1-4503-7585-6. Available under: doi: 10.1145/3383583.3398564BibTex
@inproceedings{Zhukova2020Inter-51922, year={2020}, doi={10.1145/3383583.3398564}, title={Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution}, isbn={978-1-4503-7585-6}, publisher={ACM}, address={New York}, booktitle={Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20)}, pages={461--462}, editor={Huang, Ruhua}, author={Zhukova, Anastasia and Hamborg, Felix and Gipp, Bela} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/51922"> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-11-25T14:22:35Z</dc:date> <dc:rights>terms-of-use</dc:rights> <dcterms:abstract xml:lang="eng">Topic modeling is a technique used in a broad spectrum of use cases, such as data exploration, summarization, and classification. Despite being a crucial constituent of many use cases, established topic models, such as LDA, often produce statistically valid yet non-meaningful topics, i.e., that cannot easily be interpreted by humans. In turn, the usability of topic modeling approaches, e.g., in document summarization, is non-optimal. We propose a topic modeling approach that uses TCA, a method for also near-identity cross-document coreference resolution. TCA showed promising results when resolving mentions of not only persons and other named entities, but also broad, vague, or abstract concepts. In a preliminary evaluation on news articles, we compare the approach with state-of-the-art topic modeling. We find that (1) the four baselines produce statistically valid yet hollow topics or topics that only refer to events in the dataset but not the events' topical composition. (2) TCA is the only approach that extracts topics that distinctively describe meaningful parts of the dataset.</dcterms:abstract> <dc:creator>Zhukova, Anastasia</dc:creator> <dc:contributor>Zhukova, Anastasia</dc:contributor> <dc:contributor>Hamborg, Felix</dc:contributor> <dcterms:issued>2020</dcterms:issued> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <foaf:homepage rdf:resource="http://localhost:8080/"/> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/51922"/> <dcterms:title>Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution</dcterms:title> <dc:creator>Gipp, Bela</dc:creator> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-11-25T14:22:35Z</dcterms:available> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:contributor>Gipp, Bela</dc:contributor> <dc:creator>Hamborg, Felix</dc:creator> <dc:language>eng</dc:language> </rdf:Description> </rdf:RDF>
Internal note
xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter
Examination date of dissertation
Method of financing
Comment on publication
Alliance license
Corresponding Authors der Uni Konstanz vorhanden
International Co-Authors
Bibliography of Konstanz
No