Geodesic distances for web document clustering


Dateien zu dieser Ressource

Prüfsumme: MD5:41a791952b3d6fa60c9c1de1cc17a90b

TEKIR, Selma, Florian MANSMANN, Daniel KEIM, 2011. Geodesic distances for web document clustering. 2011 Ieee Symposium On Computational Intelligence And Data Mining - Part Of 17273 - 2011 Ssci. Paris, France, 11. Apr 2011 - 15. Apr 2011. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE, pp. 15-21. ISBN 978-1-4244-9926-7. Available under: doi: 10.1109/CIDM.2011.5949449

@inproceedings{Tekir2011-04Geode-19079, title={Geodesic distances for web document clustering}, year={2011}, doi={10.1109/CIDM.2011.5949449}, isbn={978-1-4244-9926-7}, publisher={IEEE}, booktitle={2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)}, pages={15--21}, author={Tekir, Selma and Mansmann, Florian and Keim, Daniel} }

<rdf:RDF xmlns:dcterms="" xmlns:dc="" xmlns:rdf="" xmlns:bibo="" xmlns:dspace="" xmlns:foaf="" xmlns:void="" xmlns:xsd="" > <rdf:Description rdf:about=""> <dc:language>eng</dc:language> <dcterms:hasPart rdf:resource=""/> <dc:creator>Keim, Daniel</dc:creator> <dcterms:rights rdf:resource=""/> <dcterms:issued>2011-04</dcterms:issued> <dc:creator>Tekir, Selma</dc:creator> <dcterms:isPartOf rdf:resource=""/> <dcterms:bibliographicCitation>2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2011) ; 11-15 April 2011, Paris. - Piscataway : IEEE, 2011. - pp. 15-21. - ISBN 978-1-4244-9926-7</dcterms:bibliographicCitation> <dcterms:available rdf:datatype="">2012-04-19T09:37:47Z</dcterms:available> <dc:contributor>Mansmann, Florian</dc:contributor> <dspace:hasBitstream rdf:resource=""/> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dc:contributor>Keim, Daniel</dc:contributor> <dc:contributor>Tekir, Selma</dc:contributor> <dcterms:abstract xml:lang="eng">While traditional distance measures are often capable of properly describing similarity between objects, in some application areas there is still potential to fine-tune these measures with additional information provided in the data sets. In this work we combine such traditional distance measures for document analysis with link information between documents to improve clustering results. In particular, we test the effectiveness of geodesic distances as similarity measures under the space assumption of spherical geometry in a 0-sphere. Our proposed distance measure is thus a combination of the cosine distance of the term-document matrix and some curvature values in the geodesic distance formula. To estimate these curvature values, we calculate clustering coefficient values for every document from the link graph of the data set and increase their distinctiveness by means of a heuristic as these clustering coefficient values are rough estimates of the curvatures. To evaluate our work, we perform clustering tests with the k-means algorithm on the English Wikipedia hyperlinked data set with both traditional cosine distance and our proposed geodesic distance. The effectiveness of our approach is measured by computing micro-precision values of the clusters based on the provided categorical information of each article.</dcterms:abstract> <bibo:uri rdf:resource=""/> <dcterms:title>Geodesic distances for web document clustering</dcterms:title> <dc:creator>Mansmann, Florian</dc:creator> <dc:rights>terms-of-use</dc:rights> <dc:date rdf:datatype="">2012-04-19T09:37:47Z</dc:date> <foaf:homepage rdf:resource="http://localhost:8080/jspui"/> <dspace:isPartOfCollection rdf:resource=""/> </rdf:Description> </rdf:RDF>

Dateiabrufe seit 01.10.2014 (Informationen über die Zugriffsstatistik)

Tekir_190796.pdf 195

Das Dokument erscheint in:

KOPS Suche


Mein Benutzerkonto