Publikation:

On the Surprising Behavior of Distance Metric in High-Dimensional Space

Lade...
Vorschaubild

Datum

2001

Autor:innen

Aggarwal, Charu C.
Hinneburg, Alexander

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Open Access Green
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

VAN DEN BUSSCHE, Jan, ed., Victor VIANU, ed.. Database Theory — ICDT 2001. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001, pp. 420-434. Lecture Notes in Computer Science. 1973. ISBN 978-3-540-41456-8. Available under: doi: 10.1007/3-540-44503-X_27

Zusammenfassung

In recent years, the effect of the curse of high dimensionality has been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from a efficiency and/or effectiveness perspective. Recent research results show that in high dimensional space, the concept of proximity, distance or nearest neighbor may not even be qualitatively meaningful. In this paper, we view the dimensionality curse from the point of view of the distance metrics which are used to measure the similarity between objects. We specially examine the behavior of the commonly used Lk norm and show that the problem of meaningfulness in high dimensionality is sensitive to the value of k. For example, this means that the Manhattan distance metric (L1 norm) is consistently more preferable than the Euclidean distance metric (L2 norm) for high dimensional data mining applications. Using the intuition derived from our analysis, we introduce and examine a natural extension of the Lk norm to fractional distance metrics. We show that the fractional distance metric provides more meaningful results both from the theoretical and empirical perspective. The results show that fractional distance metrics can significantly improve the effectiveness of standard clustering algorithms such as the k-means algorithm.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Verknüpfte Datensätze

Zitieren

ISO 690AGGARWAL, Charu C., Alexander HINNEBURG, Daniel A. KEIM, 2001. On the Surprising Behavior of Distance Metric in High-Dimensional Space. In: VAN DEN BUSSCHE, Jan, ed., Victor VIANU, ed.. Database Theory — ICDT 2001. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001, pp. 420-434. Lecture Notes in Computer Science. 1973. ISBN 978-3-540-41456-8. Available under: doi: 10.1007/3-540-44503-X_27
BibTex
@inproceedings{Aggarwal2001-10-12Surpr-5715,
  year={2001},
  doi={10.1007/3-540-44503-X_27},
  title={On the Surprising Behavior of Distance Metric in High-Dimensional Space},
  number={1973},
  isbn={978-3-540-41456-8},
  publisher={Springer Berlin Heidelberg},
  address={Berlin, Heidelberg},
  series={Lecture Notes in Computer Science},
  booktitle={Database Theory — ICDT 2001},
  pages={420--434},
  editor={Van den Bussche, Jan and Vianu, Victor},
  author={Aggarwal, Charu C. and Hinneburg, Alexander and Keim, Daniel A.}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/5715">
    <dc:contributor>Keim, Daniel A.</dc:contributor>
    <dc:contributor>Hinneburg, Alexander</dc:contributor>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by-nc-nd/2.0/"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/5715/1/On_the_Surprising_Behavior_of_Distance_Metric_in_High_Dimensional_Space.pdf"/>
    <dc:language>eng</dc:language>
    <dcterms:abstract xml:lang="eng">In recent years, the effect of the curse of high dimensionality has been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from a efficiency and/or effectiveness perspective. Recent research results show that in high dimensional space, the concept of proximity, distance or nearest neighbor may not even be qualitatively meaningful. In this paper, we view the dimensionality curse from the point of view of the distance metrics which are used to measure the similarity between objects. We specially examine the behavior of the commonly used Lk norm and show that the problem of meaningfulness in high dimensionality is sensitive to the value of k. For example, this means that the Manhattan distance metric (L1 norm) is consistently more preferable than the Euclidean distance metric (L2 norm) for high dimensional data mining applications. Using the intuition derived from our analysis, we introduce and examine a natural extension of the Lk norm to fractional distance metrics. We show that the fractional distance metric provides more meaningful results both from the theoretical and empirical perspective. The results show that fractional distance metrics can significantly improve the effectiveness of standard clustering algorithms such as the k-means algorithm.</dcterms:abstract>
    <dc:rights>Attribution-NonCommercial-NoDerivs 2.0 Generic</dc:rights>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:contributor>Aggarwal, Charu C.</dc:contributor>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/5715"/>
    <dcterms:bibliographicCitation>First publ. in: Database theory, ICDT 200, 8th International Conference, London, UK, January 4 - 6, 2001 / Jan Van den Bussche ... (eds.). Berlin: Springer, 2001, pp. 420-434 (=Lecture notes in computer science ; 1973)</dcterms:bibliographicCitation>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:format>application/pdf</dc:format>
    <dcterms:issued>2001-10-12</dcterms:issued>
    <dcterms:title>On the Surprising Behavior of Distance Metric in High-Dimensional Space</dcterms:title>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-24T15:59:33Z</dc:date>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/5715/1/On_the_Surprising_Behavior_of_Distance_Metric_in_High_Dimensional_Space.pdf"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-24T15:59:33Z</dcterms:available>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:creator>Keim, Daniel A.</dc:creator>
    <dc:creator>Hinneburg, Alexander</dc:creator>
    <dc:creator>Aggarwal, Charu C.</dc:creator>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Nein
Begutachtet
Diese Publikation teilen