Visual Analytics of Patterns in High-Dimensional Data

Zitieren

Dateien zu dieser Ressource

Prüfsumme: MD5:b200cea404a2fe577fd33cb0d16114a9

TATU, Andrada, 2013. Visual Analytics of Patterns in High-Dimensional Data [Dissertation]. Konstanz: University of Konstanz

@phdthesis{Tatu2013Visua-24326, title={Visual Analytics of Patterns in High-Dimensional Data}, year={2013}, author={Tatu, Andrada}, address={Konstanz}, school={Universität Konstanz} }

2013-08-27T06:31:20Z deposit-license 2013 Visual Analytics of Patterns in High-Dimensional Data Tatu, Andrada Due to the technological progress over the last decades, today’s scientific and commercial applications are capable of generating, storing, and processing, massive amounts of data sets. This influences the type of data generated, which in turn means that with each data entry dierent aspects are combined and stored into one common database. Often the describing attributes are numeric; we name data with more than a handful attributes (dimensions) high-dimensional. Having to make use of these types of data archives provides new challenges to analysis techniques.<br /><br /><br /><br />The work of this thesis centers around the question of finding interesting patterns (meaningful information) in high-dimensional data sets. This task is highly challenging because of the so called curse of dimensionality, expressing that when dimensionality increases the data becomes sparse. This phenomena disturbs standard analysis techniques. Automatic techniques have to deal with the data complexity not only increasing their runtime, but also vitiating their computation functions (like distance functions). Moreover, exploring these data sets visually is hindered by the high number of dimensions that have to be displayed on the two dimensional screen space.<br /><br /><br /><br />This thesis is motivated by the idea that searching for interesting patterns in this kind of data can be done through a mixed approach of automation, visualization, and interaction. The amount of patterns a visualization contains can be measured by so called quality metrics. These automated functions can then filter the high number of high-dimensional visualizations and present to the user a pre-filtered good subset for further investigation. We propose quality metrics for scatterplots and parallel coordinates focusing on dierent user tasks like identifying clusters and correlations. We also evaluate these measures with regard to (1) their ability to identify clusters in a variety of real and synthetic datasets; (2) their correlation with human perception of clusters in scatterplots. A thorough discussion of results follows reflecting the impact on directions for future research.<br /><br /><br /><br />As quality metrics were developed for a large number of dierent high-dimensional visualization techniques, we present our reflections on how these methods are related to each other and how the approach can be developed further. For this purpose, we provide an overview of approaches that use quality metrics in high-dimensional data visualization and propose a systematization based on a comprehensive literature review.<br /><br /><br /><br />In high-dimensional data, patterns exist often only in a subset of the dimensions. Subspace clustering techniques aim at finding these subspaces where clusters exist and which might otherwise be hidden if a traditional clustering algorithm is applied. While subspace clustering approaches tackle the sparsity problem in high-dimensional data well, designing eective visualization to help analyzing the clustering result is not trivial. In addition to the cluster membership information, the relevant sets of dimensions and the overlaps of memberships and dimensions need to also be considered. Although, a number of techniques (for example, scatterplots, heat maps, dendrograms, hierarchical parallel coordinates) exist for visualizing traditional clustering results, little research has been done for visualizing subspace clustering results. Moreover, while extensive research has been carried out with regard to designing subspace clustering algorithms, surprisingly little attention has been paid to the developing of eective visualization tools analyzing the clustering result. Appropriate visualization techniques will not only help in monitoring the clustering process but, with special mining techniques, they could also enable the domain expert to guide and even to steer the subspace clustering process to reveal the patterns of interest. To this goal, we envision a concept that combines subspace clustering algorithms and interactive scalable visual exploration techniques. This work includes the task of comparative visualization and feedback guided computation of alternative clusterings. 2013-08-27T06:31:20Z eng Tatu, Andrada

Dateiabrufe seit 01.10.2014 (Informationen über die Zugriffsstatistik)

Diss_Tatu.pdf 277

Das Dokument erscheint in:

KOPS Suche


Stöbern

Mein Benutzerkonto