Geodesic distances for web document clustering

dc.contributor.authorTekir, Selmadeu
dc.contributor.authorMansmann, Florian
dc.contributor.authorKeim, Daniel A.
dc.date.accessioned2012-04-19T09:37:47Zdeu
dc.date.available2012-04-19T09:37:47Zdeu
dc.date.issued2011-04
dc.description.abstractWhile traditional distance measures are often capable of properly describing similarity between objects, in some application areas there is still potential to fine-tune these measures with additional information provided in the data sets. In this work we combine such traditional distance measures for document analysis with link information between documents to improve clustering results. In particular, we test the effectiveness of geodesic distances as similarity measures under the space assumption of spherical geometry in a 0-sphere. Our proposed distance measure is thus a combination of the cosine distance of the term-document matrix and some curvature values in the geodesic distance formula. To estimate these curvature values, we calculate clustering coefficient values for every document from the link graph of the data set and increase their distinctiveness by means of a heuristic as these clustering coefficient values are rough estimates of the curvatures. To evaluate our work, we perform clustering tests with the k-means algorithm on the English Wikipedia hyperlinked data set with both traditional cosine distance and our proposed geodesic distance. The effectiveness of our approach is measured by computing micro-precision values of the clusters based on the provided categorical information of each article.eng
dc.description.versionpublished
dc.identifier.citation2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2011) ; 11-15 April 2011, Paris. - Piscataway : IEEE, 2011. - pp. 15-21. - ISBN 978-1-4244-9926-7deu
dc.identifier.doi10.1109/CIDM.2011.5949449deu
dc.identifier.ppn375523235deu
dc.identifier.urihttp://kops.uni-konstanz.de/handle/123456789/19079
dc.language.isoengdeu
dc.legacy.dateIssued2012-04-19deu
dc.rightsterms-of-usedeu
dc.rights.urihttps://rightsstatements.org/page/InC/1.0/deu
dc.subject.ddc004deu
dc.titleGeodesic distances for web document clusteringeng
dc.typeINPROCEEDINGSdeu
dspace.entity.typePublication
kops.citation.bibtex
@inproceedings{Tekir2011-04Geode-19079,
  year={2011},
  doi={10.1109/CIDM.2011.5949449},
  title={Geodesic distances for web document clustering},
  isbn={978-1-4244-9926-7},
  publisher={IEEE},
  booktitle={2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)},
  pages={15--21},
  author={Tekir, Selma and Mansmann, Florian and Keim, Daniel A.}
}
kops.citation.iso690TEKIR, Selma, Florian MANSMANN, Daniel A. KEIM, 2011. Geodesic distances for web document clustering. 2011 Ieee Symposium On Computational Intelligence And Data Mining - Part Of 17273 - 2011 Ssci. Paris, France, 11. Apr. 2011 - 15. Apr. 2011. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE, 2011, pp. 15-21. ISBN 978-1-4244-9926-7. Available under: doi: 10.1109/CIDM.2011.5949449deu
kops.citation.iso690TEKIR, Selma, Florian MANSMANN, Daniel A. KEIM, 2011. Geodesic distances for web document clustering. 2011 Ieee Symposium On Computational Intelligence And Data Mining - Part Of 17273 - 2011 Ssci. Paris, France, Apr 11, 2011 - Apr 15, 2011. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE, 2011, pp. 15-21. ISBN 978-1-4244-9926-7. Available under: doi: 10.1109/CIDM.2011.5949449eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/19079">
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/19079/2/Tekir_190796.pdf"/>
    <dcterms:abstract xml:lang="eng">While traditional distance measures are often capable of properly describing similarity between objects, in some application areas there is still potential to fine-tune these measures with additional information provided in the data sets. In this work we combine such traditional distance measures for document analysis with link information between documents to improve clustering results. In particular, we test the effectiveness of geodesic distances as similarity measures under the space assumption of spherical geometry in a 0-sphere. Our proposed distance measure is thus a combination of the cosine distance of the term-document matrix and some curvature values in the geodesic distance formula. To estimate these curvature values, we calculate clustering coefficient values for every document from the link graph of the data set and increase their distinctiveness by means of a heuristic as these clustering coefficient values are rough estimates of the curvatures. To evaluate our work, we perform clustering tests with the k-means algorithm on the English Wikipedia hyperlinked data set with both traditional cosine distance and our proposed geodesic distance. The effectiveness of our approach is measured by computing micro-precision values of the clusters based on the provided categorical information of each article.</dcterms:abstract>
    <dcterms:title>Geodesic distances for web document clustering</dcterms:title>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:issued>2011-04</dcterms:issued>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Tekir, Selma</dc:contributor>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/19079/2/Tekir_190796.pdf"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-04-19T09:37:47Z</dcterms:available>
    <dc:language>eng</dc:language>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-04-19T09:37:47Z</dc:date>
    <dc:creator>Tekir, Selma</dc:creator>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:creator>Keim, Daniel A.</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:rights>terms-of-use</dc:rights>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/19079"/>
    <dc:contributor>Keim, Daniel A.</dc:contributor>
    <dcterms:bibliographicCitation>2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2011) ; 11-15 April 2011, Paris. - Piscataway : IEEE, 2011. - pp. 15-21. - ISBN 978-1-4244-9926-7</dcterms:bibliographicCitation>
    <dc:creator>Mansmann, Florian</dc:creator>
    <dc:contributor>Mansmann, Florian</dc:contributor>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
  </rdf:Description>
</rdf:RDF>
kops.conferencefield2011 Ieee Symposium On Computational Intelligence And Data Mining - Part Of 17273 - 2011 Ssci, 11. Apr. 2011 - 15. Apr. 2011, Paris, Francedeu
kops.date.conferenceEnd2011-04-15
kops.date.conferenceStart2011-04-11
kops.description.openAccessopenaccessgreen
kops.flag.knbibliographytrue
kops.identifier.nbnurn:nbn:de:bsz:352-190796deu
kops.location.conferenceParis, France
kops.sourcefield<i>2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)</i>. IEEE, 2011, pp. 15-21. ISBN 978-1-4244-9926-7. Available under: doi: 10.1109/CIDM.2011.5949449deu
kops.sourcefield.plain2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE, 2011, pp. 15-21. ISBN 978-1-4244-9926-7. Available under: doi: 10.1109/CIDM.2011.5949449deu
kops.sourcefield.plain2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE, 2011, pp. 15-21. ISBN 978-1-4244-9926-7. Available under: doi: 10.1109/CIDM.2011.5949449eng
kops.submitter.emailregina.fleischmann@uni-konstanz.dedeu
kops.title.conference2011 Ieee Symposium On Computational Intelligence And Data Mining - Part Of 17273 - 2011 Ssci
relation.isAuthorOfPublication90244953-4003-4a15-ae6e-0b9d164ea2a3
relation.isAuthorOfPublicationda7dafb0-6003-4fd4-803c-11e1e72d621a
relation.isAuthorOfPublication.latestForDiscovery90244953-4003-4a15-ae6e-0b9d164ea2a3
source.bibliographicInfo.fromPage15
source.bibliographicInfo.toPage21
source.identifier.isbn978-1-4244-9926-7
source.publisherIEEE
source.title2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
Tekir_190796.pdf
Größe:
147.79 KB
Format:
Adobe Portable Document Format
Tekir_190796.pdf
Tekir_190796.pdfGröße: 147.79 KBDownloads: 372

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
license.txt
Größe:
1.92 KB
Format:
Plain Text
Beschreibung:
license.txt
license.txtGröße: 1.92 KBDownloads: 0