## Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

1999
##### Authors
Hinneburg, Alexander
##### Publication type
Contribution to a conference collection
##### Published in
Proceedings of the 25 th International Conference on Very Large Databases, 1999. - pp. 506-517
##### Abstract
Many applications require the clustering of large amounts of high-dimensional data. Most clustering algorithms, however, do not work effectively and efficiently in high-dimensional space, which is due to the so-called "curse of dimensionality". In addition, the high-dimensional data often contains a significant amount of noise which causes additional effectiveness problems. In this paper, we review and compare the existing algorithms for clustering high-dimensional data and show the impact of the curse of dimensionality on their effectiveness and efficiency. The comparison reveals that condensation-based approaches (such as BIRCH or STING) are the most promising candidates for achieving the necessary efficiency, but it also shows that basically all condensation-based approaches have severe weaknesses with respect to their effectiveness in high-dimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal grid-partitioning of the data. The optimal grid-partitioning is determined by calculating the best partitioning hyperplanes for each dimension (if such a partitioning exists) using certain projections of the data. The advantages of our new approach are (1) it has a firm mathematical basis (2) it is by far more effective than existing clustering algorithms for high-dimensional data (3) it is very efficient even for large data sets of high dimensionality. To demonstrate the effectiveness and efficiency of our new approach, we perform a series of experiments on a number of different data sets including real data sets from CAD and molecular biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of our new approach.
##### Subject (DDC)
004 Computer Science
##### Cite This
ISO 690HINNEBURG, Alexander, Daniel A. KEIM, 1999. Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proceedings of the 25 th International Conference on Very Large Databases, 1999, pp. 506-517
BibTex
@inproceedings{Hinneburg1999Optim-5790,
year={1999},
title={Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering},
booktitle={Proceedings of the 25 th International Conference on Very Large Databases, 1999},
pages={506--517},
author={Hinneburg, Alexander and Keim, Daniel A.}
}

RDF
<rdf:RDF
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:bibo="http://purl.org/ontology/bibo/"
xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:void="http://rdfs.org/ns/void#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#" >
<foaf:homepage rdf:resource="http://localhost:8080/"/>
<dc:creator>Keim, Daniel A.</dc:creator>
<dcterms:abstract xml:lang="eng">Many applications require the clustering of large amounts of high-dimensional data. Most clustering algorithms, however, do not work effectively and efficiently in high-dimensional space, which is due to the so-called "curse of dimensionality". In addition, the high-dimensional data often contains a significant amount of noise which causes additional effectiveness problems. In this paper, we review and compare the existing algorithms for clustering high-dimensional data and show the impact of the curse of dimensionality on their effectiveness and efficiency. The  comparison  reveals that condensation-based approaches (such as BIRCH or STING) are the most promising candidates for achieving the necessary efficiency, but it also shows  that basically all condensation-based approaches have severe weaknesses with respect to their effectiveness in high-dimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal grid-partitioning of the data. The optimal grid-partitioning is determined by calculating the best partitioning hyperplanes for each dimension (if such a partitioning exists) using certain projections of the data. The advantages of our new approach are (1) it has a firm mathematical basis (2) it is by far more effective than existing clustering algorithms for high-dimensional data (3) it is very efficient even for large data sets of high dimensionality. To demonstrate the effectiveness and efficiency of our new approach, we perform a series of experiments on a number of different data sets including real data sets from CAD and molecular biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of our new approach.</dcterms:abstract>
<dc:contributor>Keim, Daniel A.</dc:contributor>
<dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-24T16:00:07Z</dcterms:available>
<bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/5790"/>
<dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/5790/1/vldb99.pdf"/>
<void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
<dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/5790/1/vldb99.pdf"/>
<dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
<dcterms:issued>1999</dcterms:issued>
<dcterms:bibliographicCitation>First publ. in: Proceedings of the 25th International Conference on Very Large Databases, 1999, pp. 506-517</dcterms:bibliographicCitation>
<dcterms:title>Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering</dcterms:title>
<dc:creator>Hinneburg, Alexander</dc:creator>
<dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-24T16:00:07Z</dc:date>
<dc:format>application/pdf</dc:format>