Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering
| dc.contributor.author | Hinneburg, Alexander | deu |
| dc.contributor.author | Keim, Daniel A. | |
| dc.date.accessioned | 2011-03-24T16:00:07Z | deu |
| dc.date.available | 2011-03-24T16:00:07Z | deu |
| dc.date.issued | 1999 | deu |
| dc.description.abstract | Many applications require the clustering of large amounts of high-dimensional data. Most clustering algorithms, however, do not work effectively and efficiently in high-dimensional space, which is due to the so-called "curse of dimensionality". In addition, the high-dimensional data often contains a significant amount of noise which causes additional effectiveness problems. In this paper, we review and compare the existing algorithms for clustering high-dimensional data and show the impact of the curse of dimensionality on their effectiveness and efficiency. The comparison reveals that condensation-based approaches (such as BIRCH or STING) are the most promising candidates for achieving the necessary efficiency, but it also shows that basically all condensation-based approaches have severe weaknesses with respect to their effectiveness in high-dimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal grid-partitioning of the data. The optimal grid-partitioning is determined by calculating the best partitioning hyperplanes for each dimension (if such a partitioning exists) using certain projections of the data. The advantages of our new approach are (1) it has a firm mathematical basis (2) it is by far more effective than existing clustering algorithms for high-dimensional data (3) it is very efficient even for large data sets of high dimensionality. To demonstrate the effectiveness and efficiency of our new approach, we perform a series of experiments on a number of different data sets including real data sets from CAD and molecular biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of our new approach. | eng |
| dc.description.version | published | |
| dc.format.mimetype | application/pdf | deu |
| dc.identifier.citation | First publ. in: Proceedings of the 25th International Conference on Very Large Databases, 1999, pp. 506-517 | deu |
| dc.identifier.ppn | 288451511 | deu |
| dc.identifier.uri | http://kops.uni-konstanz.de/handle/123456789/5790 | |
| dc.language.iso | eng | deu |
| dc.legacy.dateIssued | 2008 | deu |
| dc.rights | Attribution-NonCommercial-NoDerivs 2.0 Generic | |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/2.0/ | |
| dc.subject.ddc | 004 | deu |
| dc.title | Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering | eng |
| dc.type | INPROCEEDINGS | deu |
| dspace.entity.type | Publication | |
| kops.citation.bibtex | @inproceedings{Hinneburg1999Optim-5790,
year={1999},
title={Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering},
booktitle={Proceedings of the 25 th International Conference on Very Large Databases, 1999},
pages={506--517},
author={Hinneburg, Alexander and Keim, Daniel A.}
} | |
| kops.citation.iso690 | HINNEBURG, Alexander, Daniel A. KEIM, 1999. Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proceedings of the 25 th International Conference on Very Large Databases, 1999. 1999, pp. 506-517 | deu |
| kops.citation.iso690 | HINNEBURG, Alexander, Daniel A. KEIM, 1999. Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proceedings of the 25 th International Conference on Very Large Databases, 1999. 1999, pp. 506-517 | eng |
| kops.citation.rdf | <rdf:RDF
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:bibo="http://purl.org/ontology/bibo/"
xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:void="http://rdfs.org/ns/void#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#" >
<rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/5790">
<foaf:homepage rdf:resource="http://localhost:8080/"/>
<dc:creator>Keim, Daniel A.</dc:creator>
<dcterms:abstract xml:lang="eng">Many applications require the clustering of large amounts of high-dimensional data. Most clustering algorithms, however, do not work effectively and efficiently in high-dimensional space, which is due to the so-called "curse of dimensionality". In addition, the high-dimensional data often contains a significant amount of noise which causes additional effectiveness problems. In this paper, we review and compare the existing algorithms for clustering high-dimensional data and show the impact of the curse of dimensionality on their effectiveness and efficiency. The comparison reveals that condensation-based approaches (such as BIRCH or STING) are the most promising candidates for achieving the necessary efficiency, but it also shows that basically all condensation-based approaches have severe weaknesses with respect to their effectiveness in high-dimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal grid-partitioning of the data. The optimal grid-partitioning is determined by calculating the best partitioning hyperplanes for each dimension (if such a partitioning exists) using certain projections of the data. The advantages of our new approach are (1) it has a firm mathematical basis (2) it is by far more effective than existing clustering algorithms for high-dimensional data (3) it is very efficient even for large data sets of high dimensionality. To demonstrate the effectiveness and efficiency of our new approach, we perform a series of experiments on a number of different data sets including real data sets from CAD and molecular biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of our new approach.</dcterms:abstract>
<dc:contributor>Keim, Daniel A.</dc:contributor>
<dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-24T16:00:07Z</dcterms:available>
<bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/5790"/>
<dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/5790/1/vldb99.pdf"/>
<void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
<dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/5790/1/vldb99.pdf"/>
<dcterms:rights rdf:resource="http://creativecommons.org/licenses/by-nc-nd/2.0/"/>
<dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
<dcterms:issued>1999</dcterms:issued>
<dcterms:bibliographicCitation>First publ. in: Proceedings of the 25th International Conference on Very Large Databases, 1999, pp. 506-517</dcterms:bibliographicCitation>
<dcterms:title>Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering</dcterms:title>
<dc:creator>Hinneburg, Alexander</dc:creator>
<dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-24T16:00:07Z</dc:date>
<dc:format>application/pdf</dc:format>
<dc:rights>Attribution-NonCommercial-NoDerivs 2.0 Generic</dc:rights>
<dc:language>eng</dc:language>
<dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
<dc:contributor>Hinneburg, Alexander</dc:contributor>
</rdf:Description>
</rdf:RDF> | |
| kops.description.openAccess | openaccessgreen | |
| kops.flag.knbibliography | false | |
| kops.identifier.nbn | urn:nbn:de:bsz:352-opus-70410 | deu |
| kops.opus.id | 7041 | deu |
| kops.sourcefield | <i>Proceedings of the 25 th International Conference on Very Large Databases, 1999</i>. 1999, pp. 506-517 | deu |
| kops.sourcefield.plain | Proceedings of the 25 th International Conference on Very Large Databases, 1999. 1999, pp. 506-517 | deu |
| kops.sourcefield.plain | Proceedings of the 25 th International Conference on Very Large Databases, 1999. 1999, pp. 506-517 | eng |
| relation.isAuthorOfPublication | da7dafb0-6003-4fd4-803c-11e1e72d621a | |
| relation.isAuthorOfPublication.latestForDiscovery | da7dafb0-6003-4fd4-803c-11e1e72d621a | |
| source.bibliographicInfo.fromPage | 506 | |
| source.bibliographicInfo.toPage | 517 | |
| source.title | Proceedings of the 25 th International Conference on Very Large Databases, 1999 |
Dateien
Originalbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
- Name:
- vldb99.pdf
- Größe:
- 1.42 MB
- Format:
- Adobe Portable Document Format
