Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

Thumbnail Image
Date
1999
Authors
Hinneburg, Alexander
Editors
Contact
Journal ISSN
Electronic ISSN
ISBN
Bibliographical data
Publisher
Series
DOI (citable link)
ArXiv-ID
International patent number
EU project number
Project
Open Access publication
Restricted until
Title in another language
Research Projects
Organizational Units
Journal Issue
Publication type
Contribution to a conference collection
Publication status
Published in
Proceedings of the 25 th International Conference on Very Large Databases, 1999. - pp. 506-517
Abstract
Many applications require the clustering of large amounts of high-dimensional data. Most clustering algorithms, however, do not work effectively and efficiently in high-dimensional space, which is due to the so-called "curse of dimensionality". In addition, the high-dimensional data often contains a significant amount of noise which causes additional effectiveness problems. In this paper, we review and compare the existing algorithms for clustering high-dimensional data and show the impact of the curse of dimensionality on their effectiveness and efficiency. The comparison reveals that condensation-based approaches (such as BIRCH or STING) are the most promising candidates for achieving the necessary efficiency, but it also shows that basically all condensation-based approaches have severe weaknesses with respect to their effectiveness in high-dimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal grid-partitioning of the data. The optimal grid-partitioning is determined by calculating the best partitioning hyperplanes for each dimension (if such a partitioning exists) using certain projections of the data. The advantages of our new approach are (1) it has a firm mathematical basis (2) it is by far more effective than existing clustering algorithms for high-dimensional data (3) it is very efficient even for large data sets of high dimensionality. To demonstrate the effectiveness and efficiency of our new approach, we perform a series of experiments on a number of different data sets including real data sets from CAD and molecular biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of our new approach.
Summary in another language
Subject (DDC)
004 Computer Science
Keywords
Conference
Review
undefined / . - undefined, undefined. - (undefined; undefined)
Cite This
ISO 690HINNEBURG, Alexander, Daniel A. KEIM, 1999. Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proceedings of the 25 th International Conference on Very Large Databases, 1999, pp. 506-517
BibTex
@inproceedings{Hinneburg1999Optim-5790,
  year={1999},
  title={Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering},
  booktitle={Proceedings of the 25 th International Conference on Very Large Databases, 1999},
  pages={506--517},
  author={Hinneburg, Alexander and Keim, Daniel A.}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/5790">
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:creator>Keim, Daniel A.</dc:creator>
    <dcterms:abstract xml:lang="eng">Many applications require the clustering of large amounts of high-dimensional data. Most clustering algorithms, however, do not work effectively and efficiently in high-dimensional space, which is due to the so-called "curse of dimensionality". In addition, the high-dimensional data often contains a significant amount of noise which causes additional effectiveness problems. In this paper, we review and compare the existing algorithms for clustering high-dimensional data and show the impact of the curse of dimensionality on their effectiveness and efficiency. The  comparison  reveals that condensation-based approaches (such as BIRCH or STING) are the most promising candidates for achieving the necessary efficiency, but it also shows  that basically all condensation-based approaches have severe weaknesses with respect to their effectiveness in high-dimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal grid-partitioning of the data. The optimal grid-partitioning is determined by calculating the best partitioning hyperplanes for each dimension (if such a partitioning exists) using certain projections of the data. The advantages of our new approach are (1) it has a firm mathematical basis (2) it is by far more effective than existing clustering algorithms for high-dimensional data (3) it is very efficient even for large data sets of high dimensionality. To demonstrate the effectiveness and efficiency of our new approach, we perform a series of experiments on a number of different data sets including real data sets from CAD and molecular biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of our new approach.</dcterms:abstract>
    <dc:contributor>Keim, Daniel A.</dc:contributor>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-24T16:00:07Z</dcterms:available>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/5790"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/5790/1/vldb99.pdf"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/5790/1/vldb99.pdf"/>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by-nc-nd/2.0/"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:issued>1999</dcterms:issued>
    <dcterms:bibliographicCitation>First publ. in: Proceedings of the 25th International Conference on Very Large Databases, 1999, pp. 506-517</dcterms:bibliographicCitation>
    <dcterms:title>Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering</dcterms:title>
    <dc:creator>Hinneburg, Alexander</dc:creator>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-24T16:00:07Z</dc:date>
    <dc:format>application/pdf</dc:format>
    <dc:rights>Attribution-NonCommercial-NoDerivs 2.0 Generic</dc:rights>
    <dc:language>eng</dc:language>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:contributor>Hinneburg, Alexander</dc:contributor>
  </rdf:Description>
</rdf:RDF>
Internal note
xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter
Contact
URL of original publication
Test date of URL
Examination date of dissertation
Method of financing
Comment on publication
Alliance license
Corresponding Authors der Uni Konstanz vorhanden
International Co-Authors
Bibliography of Konstanz
No
Refereed