Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
DOI (zitierfähiger Link)
Internationale Patentnummer
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Sammlungen
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
Topic modeling is a technique used in a broad spectrum of use cases, such as data exploration, summarization, and classification. Despite being a crucial constituent of many use cases, established topic models, such as LDA, often produce statistically valid yet non-meaningful topics, i.e., that cannot easily be interpreted by humans. In turn, the usability of topic modeling approaches, e.g., in document summarization, is non-optimal. We propose a topic modeling approach that uses TCA, a method for also near-identity cross-document coreference resolution. TCA showed promising results when resolving mentions of not only persons and other named entities, but also broad, vague, or abstract concepts. In a preliminary evaluation on news articles, we compare the approach with state-of-the-art topic modeling. We find that (1) the four baselines produce statistically valid yet hollow topics or topics that only refer to events in the dataset but not the events' topical composition. (2) TCA is the only approach that extracts topics that distinctively describe meaningful parts of the dataset.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
ZHUKOVA, Anastasia, Felix HAMBORG, Bela GIPP, 2020. Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution. JCDL '20. China (Virtual Event), 1. Aug. 2020 - 5. Aug. 2020. In: HUANG, Ruhua, ed. and others. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20). New York: ACM, 2020, pp. 461-462. ISBN 978-1-4503-7585-6. Available under: doi: 10.1145/3383583.3398564BibTex
@inproceedings{Zhukova2020Inter-51922, year={2020}, doi={10.1145/3383583.3398564}, title={Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution}, isbn={978-1-4503-7585-6}, publisher={ACM}, address={New York}, booktitle={Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20)}, pages={461--462}, editor={Huang, Ruhua}, author={Zhukova, Anastasia and Hamborg, Felix and Gipp, Bela} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/51922"> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-11-25T14:22:35Z</dc:date> <dc:rights>terms-of-use</dc:rights> <dcterms:abstract xml:lang="eng">Topic modeling is a technique used in a broad spectrum of use cases, such as data exploration, summarization, and classification. Despite being a crucial constituent of many use cases, established topic models, such as LDA, often produce statistically valid yet non-meaningful topics, i.e., that cannot easily be interpreted by humans. In turn, the usability of topic modeling approaches, e.g., in document summarization, is non-optimal. We propose a topic modeling approach that uses TCA, a method for also near-identity cross-document coreference resolution. TCA showed promising results when resolving mentions of not only persons and other named entities, but also broad, vague, or abstract concepts. In a preliminary evaluation on news articles, we compare the approach with state-of-the-art topic modeling. We find that (1) the four baselines produce statistically valid yet hollow topics or topics that only refer to events in the dataset but not the events' topical composition. (2) TCA is the only approach that extracts topics that distinctively describe meaningful parts of the dataset.</dcterms:abstract> <dc:creator>Zhukova, Anastasia</dc:creator> <dc:contributor>Zhukova, Anastasia</dc:contributor> <dc:contributor>Hamborg, Felix</dc:contributor> <dcterms:issued>2020</dcterms:issued> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <foaf:homepage rdf:resource="http://localhost:8080/"/> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/51922"/> <dcterms:title>Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution</dcterms:title> <dc:creator>Gipp, Bela</dc:creator> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-11-25T14:22:35Z</dcterms:available> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:contributor>Gipp, Bela</dc:contributor> <dc:creator>Hamborg, Felix</dc:creator> <dc:language>eng</dc:language> </rdf:Description> </rdf:RDF>