Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

dc.contributor.authorHinneburg, Alexanderdeu
dc.contributor.authorKeim, Daniel A.
dc.date.accessioned2011-03-24T16:00:07Zdeu
dc.date.available2011-03-24T16:00:07Zdeu
dc.date.issued1999deu
dc.description.abstractMany applications require the clustering of large amounts of high-dimensional data. Most clustering algorithms, however, do not work effectively and efficiently in high-dimensional space, which is due to the so-called "curse of dimensionality". In addition, the high-dimensional data often contains a significant amount of noise which causes additional effectiveness problems. In this paper, we review and compare the existing algorithms for clustering high-dimensional data and show the impact of the curse of dimensionality on their effectiveness and efficiency. The comparison reveals that condensation-based approaches (such as BIRCH or STING) are the most promising candidates for achieving the necessary efficiency, but it also shows that basically all condensation-based approaches have severe weaknesses with respect to their effectiveness in high-dimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal grid-partitioning of the data. The optimal grid-partitioning is determined by calculating the best partitioning hyperplanes for each dimension (if such a partitioning exists) using certain projections of the data. The advantages of our new approach are (1) it has a firm mathematical basis (2) it is by far more effective than existing clustering algorithms for high-dimensional data (3) it is very efficient even for large data sets of high dimensionality. To demonstrate the effectiveness and efficiency of our new approach, we perform a series of experiments on a number of different data sets including real data sets from CAD and molecular biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of our new approach.eng
dc.description.versionpublished
dc.format.mimetypeapplication/pdfdeu
dc.identifier.citationFirst publ. in: Proceedings of the 25th International Conference on Very Large Databases, 1999, pp. 506-517deu
dc.identifier.ppn288451511deu
dc.identifier.urihttp://kops.uni-konstanz.de/handle/123456789/5790
dc.language.isoengdeu
dc.legacy.dateIssued2008deu
dc.rightsAttribution-NonCommercial-NoDerivs 2.0 Generic
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/2.0/
dc.subject.ddc004deu
dc.titleOptimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clusteringeng
dc.typeINPROCEEDINGSdeu
dspace.entity.typePublication
kops.citation.bibtex
@inproceedings{Hinneburg1999Optim-5790,
  year={1999},
  title={Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering},
  booktitle={Proceedings of the 25 th International Conference on Very Large Databases, 1999},
  pages={506--517},
  author={Hinneburg, Alexander and Keim, Daniel A.}
}
kops.citation.iso690HINNEBURG, Alexander, Daniel A. KEIM, 1999. Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proceedings of the 25 th International Conference on Very Large Databases, 1999. 1999, pp. 506-517deu
kops.citation.iso690HINNEBURG, Alexander, Daniel A. KEIM, 1999. Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proceedings of the 25 th International Conference on Very Large Databases, 1999. 1999, pp. 506-517eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/5790">
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:creator>Keim, Daniel A.</dc:creator>
    <dcterms:abstract xml:lang="eng">Many applications require the clustering of large amounts of high-dimensional data. Most clustering algorithms, however, do not work effectively and efficiently in high-dimensional space, which is due to the so-called "curse of dimensionality". In addition, the high-dimensional data often contains a significant amount of noise which causes additional effectiveness problems. In this paper, we review and compare the existing algorithms for clustering high-dimensional data and show the impact of the curse of dimensionality on their effectiveness and efficiency. The  comparison  reveals that condensation-based approaches (such as BIRCH or STING) are the most promising candidates for achieving the necessary efficiency, but it also shows  that basically all condensation-based approaches have severe weaknesses with respect to their effectiveness in high-dimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal grid-partitioning of the data. The optimal grid-partitioning is determined by calculating the best partitioning hyperplanes for each dimension (if such a partitioning exists) using certain projections of the data. The advantages of our new approach are (1) it has a firm mathematical basis (2) it is by far more effective than existing clustering algorithms for high-dimensional data (3) it is very efficient even for large data sets of high dimensionality. To demonstrate the effectiveness and efficiency of our new approach, we perform a series of experiments on a number of different data sets including real data sets from CAD and molecular biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of our new approach.</dcterms:abstract>
    <dc:contributor>Keim, Daniel A.</dc:contributor>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-24T16:00:07Z</dcterms:available>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/5790"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/5790/1/vldb99.pdf"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/5790/1/vldb99.pdf"/>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by-nc-nd/2.0/"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:issued>1999</dcterms:issued>
    <dcterms:bibliographicCitation>First publ. in: Proceedings of the 25th International Conference on Very Large Databases, 1999, pp. 506-517</dcterms:bibliographicCitation>
    <dcterms:title>Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering</dcterms:title>
    <dc:creator>Hinneburg, Alexander</dc:creator>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-24T16:00:07Z</dc:date>
    <dc:format>application/pdf</dc:format>
    <dc:rights>Attribution-NonCommercial-NoDerivs 2.0 Generic</dc:rights>
    <dc:language>eng</dc:language>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:contributor>Hinneburg, Alexander</dc:contributor>
  </rdf:Description>
</rdf:RDF>
kops.description.openAccessopenaccessgreen
kops.flag.knbibliographyfalse
kops.identifier.nbnurn:nbn:de:bsz:352-opus-70410deu
kops.opus.id7041deu
kops.sourcefield<i>Proceedings of the 25 th International Conference on Very Large Databases, 1999</i>. 1999, pp. 506-517deu
kops.sourcefield.plainProceedings of the 25 th International Conference on Very Large Databases, 1999. 1999, pp. 506-517deu
kops.sourcefield.plainProceedings of the 25 th International Conference on Very Large Databases, 1999. 1999, pp. 506-517eng
relation.isAuthorOfPublicationda7dafb0-6003-4fd4-803c-11e1e72d621a
relation.isAuthorOfPublication.latestForDiscoveryda7dafb0-6003-4fd4-803c-11e1e72d621a
source.bibliographicInfo.fromPage506
source.bibliographicInfo.toPage517
source.titleProceedings of the 25 th International Conference on Very Large Databases, 1999

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
vldb99.pdf
Größe:
1.42 MB
Format:
Adobe Portable Document Format
vldb99.pdf
vldb99.pdfGröße: 1.42 MBDownloads: 2396