Visual cluster analysis of trajectory data with interactive Kohonen maps
2009, Schreck, Tobias, Bernard, Jürgen, von Landesberger, Tatiana, Kohlhammer, Jörn
Visual-interactive cluster analysis provides valuable tools for effectively analyzing large and complex data sets. Owing to desirable properties and an inherent predisposition for visualization, the Kohonen Feature Map (or Self-Organizing Map or SOM) algorithm is among the most popular and widely used visual clustering techniques. However, the unsupervised nature of the algorithm may be disadvantageous in certain applications. Depending on initialization and data characteristics, cluster maps (cluster layouts) may emerge that do not comply with user preferences, expectations or the application context. Considering SOM-based analysis of trajectory data, we propose a comprehensive visual-interactive monitoring and control framework extending the basic SOM algorithm. The framework implements the general Visual Analytics idea to effectively combine automatic data analysis with human expert supervision. It provides simple, yet effective facilities for visually monitoring and interactively controlling the trajectory clustering process at arbitrary levels of detail. The approach allows the user to leverage existing domain knowledge and user preferences, arriving at improved cluster maps. We apply the framework on several trajectory clustering problems, demonstrating its potential in combining both unsupervised (machine) and supervised (human expert) processing, in producing appropriate cluster results.
A New Metaphor for Projection-Based Visual Analysis and Data Exploration
2007-01-28, Schreck, Tobias, Panse, Christian
In many important application domains such as Business and Finance, Process Monitoring, and Security, huge and quickly increasing volumes of complex data are collected. Strong efforts are underway developing automatic and interactive analysis tools for mining useful information from these data repositories. Many data analysis algorithms require an appropriate definition of similarity (or distance) between data instances to allow meaningful clustering, classification, and retrieval, among other analysis tasks. Projection-based data visualization is highly interesting (a) for visual discrimination analysis of a data set within a given similarity definition, and (b) for comparative analysis of similarity characteristics of a given data set represented by different similarity definitions. We introduce an intuitive and effective novel approach for projection-based similarity visualization for interactive discrimination analysis, data exploration, and visual evaluation of metric space effectiveness. The approach is based on the convex hull metaphor for visually aggregating sets of points in projected space, and it can be used with a variety of different projection techniques. The effectiveness of the approach is demonstrated by application on two well-known data sets. Statistical evidence supporting the validity of the hull metaphor is presented. We advocate the hull-based approach over the standard symbol-based approach to projection visualization, as it allows a more effective perception of similarity relationships and class distribution characteristics.
Foundations of 3D Digital Libraries : current approaches and urgent research challenges
2007, Bustos Cárdenas, Benjamin Eugenio, Fellner, Dieter W., Havemann, Sven, Keim, Daniel A., Saupe, Dietmar, Schreck, Tobias
3D documents are an indispensable data type in many important application domains such as Computer Aided Design, Simulation and Visualization, and Cultural Heritage, to name a few. The 3D document type can represent arbitrarily complex information by composing geometrical, topological, structural, or material properties, among others. It often is integrated with meta data and annotation by the various application systems that produce, process, or consume 3D documents. We argue that due to the inherent complexity of the 3D data type in conjunction with and imminent pervasive usage and explosion of available content, there is pressing need to address key problems of the 3D data type. These problems need to be tackled before the 3D data type can be fully supported by Digital Library technology in the sense of a generalized document, unlocking its full potential. If the problems are addressed appropriately, the expected benefits are manifold and may lead to radically improved production, processing, and consumption of 3D content.
Visual Rank Analysis for Search Engine Benchmarking and Efficient Navigation
2007, Catarci, Tiziana, Keim, Daniel A., Santucci, Giuseppe, Schreck, Tobias, Iervella, Gloria, Iannarelli, Stefano, Veltri, Fabio
In many important applications, the search for non-standard data types is essential. E.g., digital libraries and multimedia database systems offer content-based search functionality for images and 3D documents. Contrary to the annotation-based approach, where information manually attached to the data objects if used for retrieval, in content-based retrieval, automatically derived meta-data is used. However, the quality of the meta data is crucial, and often, it a priori is not clear which meta data is best suited to execute a user-issued query. Owing to the multi-meta data problem, two crucial questions arise: (a) how can different meta data (feature vector) schemas be benchmarked to assess their suitability for solving the retrieval problem effectively, and (b) how to support the user with issuing queries to the retrieval system, considering different choices for the type of meta data to engage in the search. In this paper, we address these questions in a two-fold contribution. Based on the DARE visualization system, we first introduce an approach for the visual benchmarking of multiple meta data formats on a ground truth benchmark, supporting the optimization stage of the multimedia database design. We secondly propose a simple, yet effective visual interface to multiple, long lists (rankings) of answer objects for the user. The latter, based on relevance feedback information supplied by the user, allows the effective identification of the meta data schema best suited for executing the similarity queries at hand.
An Image-Based Approach to Visual Feature Space Analysis
2008, Schreck, Tobias, Schneidewind, Jörn, Keim, Daniel A.
Methods for management and analysis of non-standard data often rely on the so-called feature vector approach. The technique describes complex data instances by vectors of characteristic numeric values which allow to index the data and to calculate similarity scores between the data elements. Thereby, feature vectors often are a key ingredient to intelligent data analysis algorithms including instances of clustering, classification, and similarity search algorithms. However, identification of appropriate feature vectors for a given database of a given data type is a challenging task. Determining good feature vector extractors usually involves benchmarks relying on supervised information, which makes it an expensive and data dependent process. In this paper, we address the feature selection problem by a novel approach based on analysis of certain feature space images. We develop two image-based analysis techniques for the automatic discrimination power analysis of feature spaces. We evaluate the techniques on a comprehensive feature selection benchmark, demonstrating the effectiveness of our analysis and its potential toward automatically addressing the feature selection problem.
A Visual Analysis of Multi-Attribute Data Using Pixel Matrix Displays
2007-01-28, Hao, Ming C., Dayal, Umeshwar, Keim, Daniel A., Schreck, Tobias
Charts and tables are commonly used to visually analyze data. These graphics are simple and easy to understand, but charts show only highly aggregated data and present only a limited number of data values while tables often show too many data values. As a consequence, these graphics may either lose or obscure important information, so different techniques are required to monitor complex datasets. Users need more powerful visualization techniques to digest and compare detailed multi-attribute information to analyze the health of their business. This paper proposes an innovative solution based on the use of pixel-matrix to represent transaction-level information within graphics. With pixel-matrixes, users can visualize areas of importance at a glance, a capability not provided by common charting techniques. Our solutions are based on colored pixel-matrixes, which are used in (1) charts for visualizing data patterns and discovering exceptions, (2) tables for visualizing correlations and finding root-causes, and (3) time series for visualizing the evolution of long-running transactions. The solutions have been applied with success to product sales, Internet network performance analysis, and service contract applications demonstrating the benefits of our method over conventional graphics. The method is especially useful when detailed information is a key part of the analysis.
Semiautomatic benchmarking of feature vectors for multimedia retrieval
2007, Schreck, Tobias, Schneidewind, Jörn, Keim, Daniel A., Ward, Matthew O., Tatu, Andrada
Modern Digital Library applications store and process massive amounts of information. Usually, this data is not limited to raw textual or numeric data - typical applications also deal with multimedia data such as images, audio, video, or 3D geometric models. For providing effective retrieval functionality, appropriate meta data descriptors that allow calculation of similarity scores between data instances are requires. Feature vectors are a generic way for describing multimedia data by vectors formed from numerically captured object features. They are used in similarity search, but also, can be used for clustering and wider multimedia analysis applications. Extracting effective feature vectors for a given data type is a challenging task. Determining good feature vector extractors usually involves experimentation and application of supervised information. However, such experimentation usually is expensive, and supervised information often is data dependent. We address the feature selection problem by a novel approach based on analysis of certain feature space images. We develop two image-based analysis techniques for the automatic discrimination power analysis of feature spaces. We evaluate the techniques on a comprehensive feature selection benchmark, demonstrating the effectiveness of our analysis and its potential toward automatically addressing the feature selection problem.
Towards Automatic Feature Vector Optimization for Multimedia Applications
2008, Schreck, Tobias, Fellner, Dieter W., Keim, Daniel A.
We systematically evaluate a recently proposed method for unsupervised discrimination power analysis for feature selection and optimization in multimedia applications. A series of experiments using real and synthetic benchmark data is conducted, the results of which indicate the suitability of the method for unsupervised feature selection and optimization. We present an approach for generating synthetic feature spaces of varying discrimination power, modeling main characteristics from real world feature vector extractors. A simple, yet powerful visualization is used to communicate the results of the automatic analysis to the user.
Multi-Resolution Techniques for Visual Exploration of Large Time-Series Data
2007, Hao, Ming C., Dayal, Umeshwar, Keim, Daniel A., Schreck, Tobias
Time series are a data type of utmost importance in many domains such as business management and service monitoring. We address the problem of visualizing large time-related data sets which are difficult to visualize effectively with standard techniques given the limitations of current display devices. We propose a framework for intelligent time- and data-dependent visual aggregation of data along multiple resolution levels. This idea leads to effective visualization support for long time-series data providing both focus and context. The basic idea of the technique is that either data-dependent or application-dependent, display space is allocated in proportion to the degree of interest of data subintervals, thereby (a) guiding the user in perceiving important information, and (b) freeing required display space to visualize all the data. The automatic part of the framework can accommodate any time series analysis algorithm yielding a numeric degree of interest scale. We apply our techniques on real-world data sets, compare it with the standard visualization approach, and conclude the usefulness and scalability of the approach.