Publikation:

Analyzing Document Collections via Context-Aware Term Extraction

Lade...
Vorschaubild

Dateien

12460.13pdf.pdf
12460.13pdf.pdfGröße: 5.84 MBDownloads: 408

Datum

2010

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Open Access Green
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

HORACEK, Helmut, ed., Elisabeth MÉTAIS, ed., Rafael MUÑOZ, ed., Magdalena WOLSKA, ed.. Natural Language Processing and Information Systems. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 154-168. Lecture Notes in Computer Science. 5723. ISBN 978-3-642-12549-2. Available under: doi: 10.1007/978-3-642-12550-8_13

Zusammenfassung

In large collections of documents that are divided into predefined classes, the differences and similarities of those classes are of special interest. This paper presents an approach that is able to automatically extract terms from such document collections which describe what topics discriminate a single class from the others (discriminating terms) and which topics discriminate a subset of the classes against the remaining ones (overlap terms). The importance for real world applications and the effectiveness of our approach are demonstrated by two out of practice examples. In a first application our predefined classes correspond to different scientific conferences. By extracting terms from collections of papers published on these conferences, we determine automatically the topical differences and similarities of the conferences. In our second application task we extract terms out of a collection of product reviews which show what features reviewers commented on. We get these terms by discriminating the product review class against a suitable counter-balance class. Finally, our method is evaluated comparing it to alternative approaches.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690KEIM, Daniel A., Daniela OELKE, Christian ROHRDANTZ, 2010. Analyzing Document Collections via Context-Aware Term Extraction. In: HORACEK, Helmut, ed., Elisabeth MÉTAIS, ed., Rafael MUÑOZ, ed., Magdalena WOLSKA, ed.. Natural Language Processing and Information Systems. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 154-168. Lecture Notes in Computer Science. 5723. ISBN 978-3-642-12549-2. Available under: doi: 10.1007/978-3-642-12550-8_13
BibTex
@inproceedings{Keim2010Analy-6445,
  year={2010},
  doi={10.1007/978-3-642-12550-8_13},
  title={Analyzing Document Collections via Context-Aware Term Extraction},
  number={5723},
  isbn={978-3-642-12549-2},
  publisher={Springer Berlin Heidelberg},
  address={Berlin, Heidelberg},
  series={Lecture Notes in Computer Science},
  booktitle={Natural Language Processing and Information Systems},
  pages={154--168},
  editor={Horacek, Helmut and Métais, Elisabeth and Muñoz, Rafael and Wolska, Magdalena},
  author={Keim, Daniel A. and Oelke, Daniela and Rohrdantz, Christian}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/6445">
    <dc:creator>Rohrdantz, Christian</dc:creator>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:abstract xml:lang="eng">In large collections of documents that are divided into predefined classes, the differences and similarities of those classes are of special interest. This paper presents an approach that is able to automatically extract terms from such document collections which describe what topics discriminate a single class from the others (discriminating terms) and which topics discriminate a subset of the classes against the remaining ones (overlap terms). The importance for real world applications and the effectiveness of our approach are demonstrated by two out of practice examples. In a first application our predefined classes correspond to different scientific conferences. By extracting terms from collections of papers published on these conferences, we determine automatically the topical differences and similarities of the conferences. In our second application task we extract terms out of a collection of product reviews which show what features reviewers commented on. We get these terms by discriminating the product review class against a suitable counter-balance class. Finally, our method is evaluated comparing it to alternative approaches.</dcterms:abstract>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/6445/1/12460.13pdf.pdf"/>
    <dc:contributor>Oelke, Daniela</dc:contributor>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:bibliographicCitation>Also publ. in: Natural Language Processing and Information Systems : 14th International Conference on Applications of Natural Language to Information Systems, NLDB 2009, Saarbrücken, Germany, June 24-26. / ed. by Helmut Horacek ...(Eds.). - Berlin : Springer, 2010. - pp. 154-168. - ISBN 978-3-642-12550-8</dcterms:bibliographicCitation>
    <dc:creator>Keim, Daniel A.</dc:creator>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/6445"/>
    <dc:contributor>Rohrdantz, Christian</dc:contributor>
    <dc:format>application/pdf</dc:format>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:rights>terms-of-use</dc:rights>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-05-31T22:25:04Z</dcterms:available>
    <dc:language>eng</dc:language>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/6445/1/12460.13pdf.pdf"/>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-24T16:12:45Z</dc:date>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:creator>Oelke, Daniela</dc:creator>
    <dc:contributor>Keim, Daniel A.</dc:contributor>
    <dcterms:title>Analyzing Document Collections via Context-Aware Term Extraction</dcterms:title>
    <dcterms:issued>2010</dcterms:issued>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Diese Publikation teilen