Multiscale visual quality assessment for cluster analysis with self-organizing maps
2011-01-24, Bernard, Jürgen, Landesberger, Tatiana von, Bremm, Sebastian, Schreck, Tobias
Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many di erent clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with re ned parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual
cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We de ne, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.
An Image-Based Approach to Visual Feature Space Analysis
2008, Schreck, Tobias, Schneidewind, Jörn, Keim, Daniel A.
Methods for management and analysis of non-standard data often rely on the so-called feature vector approach. The technique describes complex data instances by vectors of characteristic numeric values which allow to index the data and to calculate similarity scores between the data elements. Thereby, feature vectors often are a key ingredient to intelligent data analysis algorithms including instances of clustering, classification, and similarity search algorithms. However, identification of appropriate feature vectors for a given database of a given data type is a challenging task. Determining good feature vector extractors usually involves benchmarks relying on supervised information, which makes it an expensive and data dependent process. In this paper, we address the feature selection problem by a novel approach based on analysis of certain feature space images. We develop two image-based analysis techniques for the automatic discrimination power analysis of feature spaces. We evaluate the techniques on a comprehensive feature selection benchmark, demonstrating the effectiveness of our analysis and its potential toward automatically addressing the feature selection problem.
Semiautomatic benchmarking of feature vectors for multimedia retrieval
2007, Schreck, Tobias, Schneidewind, Jörn, Keim, Daniel A., Ward, Matthew O., Tatu, Andrada
Modern Digital Library applications store and process massive amounts of information. Usually, this data is not limited to raw textual or numeric data - typical applications also deal with multimedia data such as images, audio, video, or 3D geometric models. For providing effective retrieval functionality, appropriate meta data descriptors that allow calculation of similarity scores between data instances are requires. Feature vectors are a generic way for describing multimedia data by vectors formed from numerically captured object features. They are used in similarity search, but also, can be used for clustering and wider multimedia analysis applications. Extracting effective feature vectors for a given data type is a challenging task. Determining good feature vector extractors usually involves experimentation and application of supervised information. However, such experimentation usually is expensive, and supervised information often is data dependent. We address the feature selection problem by a novel approach based on analysis of certain feature space images. We develop two image-based analysis techniques for the automatic discrimination power analysis of feature spaces. We evaluate the techniques on a comprehensive feature selection benchmark, demonstrating the effectiveness of our analysis and its potential toward automatically addressing the feature selection problem.