Type of Publication:  Contribution to a conference 
URI (citable link):  http://nbnresolving.de/urn:nbn:de:bsz:352opus70410 
Author:  Hinneburg, Alexander; Keim, Daniel A. 
Year of publication:  1999 
Published in:  Proceedings of the 25 th International Conference on Very Large Databases, 1999.  pp. 506517 
Summary: 
Many applications require the clustering of large amounts of highdimensional data. Most clustering algorithms, however, do not work effectively and efficiently in highdimensional space, which is due to the socalled "curse of dimensionality". In addition, the highdimensional data often contains a significant amount of noise which causes additional effectiveness problems. In this paper, we review and compare the existing algorithms for clustering highdimensional data and show the impact of the curse of dimensionality on their effectiveness and efficiency. The comparison reveals that condensationbased approaches (such as BIRCH or STING) are the most promising candidates for achieving the necessary efficiency, but it also shows that basically all condensationbased approaches have severe weaknesses with respect to their effectiveness in highdimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal gridpartitioning of the data. The optimal gridpartitioning is determined by calculating the best partitioning hyperplanes for each dimension (if such a partitioning exists) using certain projections of the data. The advantages of our new approach are (1) it has a firm mathematical basis (2) it is by far more effective than existing clustering algorithms for highdimensional data (3) it is very efficient even for large data sets of high dimensionality. To demonstrate the effectiveness and efficiency of our new approach, we perform a series of experiments on a number of different data sets including real data sets from CAD and molecular biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of our new approach.

Subject (DDC):  004 Computer Science 
Link to License:  Terms of use 
HINNEBURG, Alexander, Daniel A. KEIM, 1999. Optimal GridClustering : Towards Breaking the Curse of Dimensionality in HighDimensional Clustering. In: Proceedings of the 25 th International Conference on Very Large Databases, 1999, pp. 506517
@inproceedings{Hinneburg1999Optim5790, title={Optimal GridClustering : Towards Breaking the Curse of Dimensionality in HighDimensional Clustering}, year={1999}, booktitle={Proceedings of the 25 th International Conference on Very Large Databases, 1999}, pages={506517}, author={Hinneburg, Alexander and Keim, Daniel A.} }
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22rdfsyntaxns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digitalrepositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.unikonstanz.de/rdf/resource/123456789/5790"> <foaf:homepage rdf:resource="http://localhost:8080/jspui"/> <dc:rights>termsofuse</dc:rights> <dcterms:isPartOf rdf:resource="https://kops.unikonstanz.de/rdf/resource/123456789/36"/> <dc:contributor>Keim, Daniel A.</dc:contributor> <dc:format>application/pdf</dc:format> <dcterms:bibliographicCitation>First publ. in: Proceedings of the 25th International Conference on Very Large Databases, 1999, pp. 506517</dcterms:bibliographicCitation> <dspace:isPartOfCollection rdf:resource="https://kops.unikonstanz.de/rdf/resource/123456789/36"/> <dcterms:title>Optimal GridClustering : Towards Breaking the Curse of Dimensionality in HighDimensional Clustering</dcterms:title> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">20110324T16:00:07Z</dc:date> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dcterms:rights rdf:resource="https://creativecommons.org/licenses/byncnd/2.0/legalcode"/> <dc:contributor>Hinneburg, Alexander</dc:contributor> <bibo:uri rdf:resource="http://kops.unikonstanz.de/handle/123456789/5790"/> <dcterms:abstract xml:lang="eng">Many applications require the clustering of large amounts of highdimensional data. Most clustering algorithms, however, do not work effectively and efficiently in highdimensional space, which is due to the socalled "curse of dimensionality". In addition, the highdimensional data often contains a significant amount of noise which causes additional effectiveness problems. In this paper, we review and compare the existing algorithms for clustering highdimensional data and show the impact of the curse of dimensionality on their effectiveness and efficiency. The comparison reveals that condensationbased approaches (such as BIRCH or STING) are the most promising candidates for achieving the necessary efficiency, but it also shows that basically all condensationbased approaches have severe weaknesses with respect to their effectiveness in highdimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal gridpartitioning of the data. The optimal gridpartitioning is determined by calculating the best partitioning hyperplanes for each dimension (if such a partitioning exists) using certain projections of the data. The advantages of our new approach are (1) it has a firm mathematical basis (2) it is by far more effective than existing clustering algorithms for highdimensional data (3) it is very efficient even for large data sets of high dimensionality. To demonstrate the effectiveness and efficiency of our new approach, we perform a series of experiments on a number of different data sets including real data sets from CAD and molecular biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of our new approach.</dcterms:abstract> <dc:language>eng</dc:language> <dc:creator>Hinneburg, Alexander</dc:creator> <dc:creator>Keim, Daniel A.</dc:creator> <dcterms:hasPart rdf:resource="https://kops.unikonstanz.de/bitstream/123456789/5790/1/vldb99.pdf"/> <dspace:hasBitstream rdf:resource="https://kops.unikonstanz.de/bitstream/123456789/5790/1/vldb99.pdf"/> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">20110324T16:00:07Z</dcterms:available> <dcterms:issued>1999</dcterms:issued> </rdf:Description> </rdf:RDF>
vldb99.pdf  1609 