Publikation:

Geodesic distances for web document clustering

Lade...
Vorschaubild

Dateien

Tekir_190796.pdf
Tekir_190796.pdfGröße: 147.79 KBDownloads: 305

Datum

2011

Autor:innen

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Open Access Green
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE, 2011, pp. 15-21. ISBN 978-1-4244-9926-7. Available under: doi: 10.1109/CIDM.2011.5949449

Zusammenfassung

While traditional distance measures are often capable of properly describing similarity between objects, in some application areas there is still potential to fine-tune these measures with additional information provided in the data sets. In this work we combine such traditional distance measures for document analysis with link information between documents to improve clustering results. In particular, we test the effectiveness of geodesic distances as similarity measures under the space assumption of spherical geometry in a 0-sphere. Our proposed distance measure is thus a combination of the cosine distance of the term-document matrix and some curvature values in the geodesic distance formula. To estimate these curvature values, we calculate clustering coefficient values for every document from the link graph of the data set and increase their distinctiveness by means of a heuristic as these clustering coefficient values are rough estimates of the curvatures. To evaluate our work, we perform clustering tests with the k-means algorithm on the English Wikipedia hyperlinked data set with both traditional cosine distance and our proposed geodesic distance. The effectiveness of our approach is measured by computing micro-precision values of the clusters based on the provided categorical information of each article.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Konferenz

2011 Ieee Symposium On Computational Intelligence And Data Mining - Part Of 17273 - 2011 Ssci, 11. Apr. 2011 - 15. Apr. 2011, Paris, France
Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690TEKIR, Selma, Florian MANSMANN, Daniel A. KEIM, 2011. Geodesic distances for web document clustering. 2011 Ieee Symposium On Computational Intelligence And Data Mining - Part Of 17273 - 2011 Ssci. Paris, France, 11. Apr. 2011 - 15. Apr. 2011. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE, 2011, pp. 15-21. ISBN 978-1-4244-9926-7. Available under: doi: 10.1109/CIDM.2011.5949449
BibTex
@inproceedings{Tekir2011-04Geode-19079,
  year={2011},
  doi={10.1109/CIDM.2011.5949449},
  title={Geodesic distances for web document clustering},
  isbn={978-1-4244-9926-7},
  publisher={IEEE},
  booktitle={2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)},
  pages={15--21},
  author={Tekir, Selma and Mansmann, Florian and Keim, Daniel A.}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/19079">
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/19079/2/Tekir_190796.pdf"/>
    <dcterms:abstract xml:lang="eng">While traditional distance measures are often capable of properly describing similarity between objects, in some application areas there is still potential to fine-tune these measures with additional information provided in the data sets. In this work we combine such traditional distance measures for document analysis with link information between documents to improve clustering results. In particular, we test the effectiveness of geodesic distances as similarity measures under the space assumption of spherical geometry in a 0-sphere. Our proposed distance measure is thus a combination of the cosine distance of the term-document matrix and some curvature values in the geodesic distance formula. To estimate these curvature values, we calculate clustering coefficient values for every document from the link graph of the data set and increase their distinctiveness by means of a heuristic as these clustering coefficient values are rough estimates of the curvatures. To evaluate our work, we perform clustering tests with the k-means algorithm on the English Wikipedia hyperlinked data set with both traditional cosine distance and our proposed geodesic distance. The effectiveness of our approach is measured by computing micro-precision values of the clusters based on the provided categorical information of each article.</dcterms:abstract>
    <dcterms:title>Geodesic distances for web document clustering</dcterms:title>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:issued>2011-04</dcterms:issued>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Tekir, Selma</dc:contributor>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/19079/2/Tekir_190796.pdf"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-04-19T09:37:47Z</dcterms:available>
    <dc:language>eng</dc:language>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-04-19T09:37:47Z</dc:date>
    <dc:creator>Tekir, Selma</dc:creator>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:creator>Keim, Daniel A.</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:rights>terms-of-use</dc:rights>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/19079"/>
    <dc:contributor>Keim, Daniel A.</dc:contributor>
    <dcterms:bibliographicCitation>2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2011) ; 11-15 April 2011, Paris. - Piscataway : IEEE, 2011. - pp. 15-21. - ISBN 978-1-4244-9926-7</dcterms:bibliographicCitation>
    <dc:creator>Mansmann, Florian</dc:creator>
    <dc:contributor>Mansmann, Florian</dc:contributor>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Diese Publikation teilen