SOMFlow : Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance
2018-01, Sacha, Dominik, Kraus, Matthias, Bernard, Jürgen, Behrisch, Michael, Schreck, Tobias, Asano, Yuki, Keim, Daniel A.
Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.
A visual digital library approach for time-oriented scientific primary data
2011, Bernard, Jürgen, Brase, Jan, Fellner, Dieter, Koepler, Oliver, Kohlhammer, Jörn, Ruppert, Tobias, Schreck, Tobias, Sens, Irina
Digital Library support for textual and certain types of non-textual documents has significantly advanced over the last years. While Digital Library support implies many aspects along the whole library workflow model, interactive and visual retrieval allowing effective query formulation and result presentation are important functions. Recently, new kinds of non-textual documents which merit Digital Library support, but yet cannot be fully accommodated by existing Digital Library technology, have come into focus. Scientific data, as produced for example, by scientific experimentation, simulation or observation, is such a document type. In this article we report on a concept and first implementation of Digital Library functionality for supporting visual retrieval and exploration in a specific important class of scientific primary data, namely, time-oriented research data. The approach is developed in an interdisciplinary effort by experts from the library, natural sciences, and visual analytics communities. In addition to presenting the concept and to discussing relevant challenges, we present results from a first implementation of our approach as applied on a real-world scientific primary data set. We also report from initial user feedback obtained during discussions with domain experts from the earth observation sciences, indicating the usefulness of our approach.
VisInfo : a digital library system for time series research data based on exploratory search - a user-centered design approach
2015, Bernard, Jürgen, Daberkow, Debora, Fellner, Dieter, Fischer, Katrin, Koepler, Oliver, Kohlhammer, Jörn, Runnwerth, Mila, Ruppert, Tobias, Schreck, Tobias, Sens, Irina
To this day, data-driven science is a widely accepted concept in the digital library (DL) context (Hey et al. in The fourth paradigm: data-intensive scientific discovery. Microsoft Research, 2009). In the same way, domain knowledge from information visualization, visual analytics, and exploratory search has found its way into the DL workflow. This trend is expected to continue, considering future DL challenges such as content-based access to new document types, visual search, and exploration for information landscapes, or big data in general. To cope with these challenges, DL actors need to collaborate with external specialists from different domains to complement each other and succeed in given tasks such as making research data publicly available. Through these interdisciplinary approaches, the DL ecosystem may contribute to applications focused on data-driven science and digital scholarship. In this work, we present VisInfo (2014) , a web-based digital library system (DLS) with the goal to provide visual access to time series research data. Based on an exploratory search (ES) concept (White and Roth in Synth Lect Inf Concepts Retr Serv 1(1):1–98, 2009), VisInfo at first provides a content-based overview visualization of large amounts of time series research data. Further, the system enables the user to define visual queries by example or by sketch. Finally, VisInfo presents visual-interactive capability for the exploration of search results. The development process of VisInfo was based on the user-centered design principle. Experts from computer science, a scientific digital library, usability engineering, and scientists from the earth, and environmental sciences were involved in an interdisciplinary approach. We report on comprehensive user studies in the requirement analysis phase based on paper prototyping, user interviews, screen casts, and user questionnaires. Heuristic evaluations and two usability testing rounds were applied during the system implementation and the deployment phase and certify measurable improvements for our DLS. Based on the lessons learned in VisInfo, we suggest a generalized project workflow that may be applied in related, prospective approaches.
Assisted descriptor selection based on visual comparative data analysis
2011, Bremm, Sebastian, Landesberger, Tatiana von, Bernard, Jürgen, Schreck, Tobias
Exploration and selection of data descriptors representing objects using a set of features are important components in many data analysis tasks. Usually, for a given dataset, an optimal data description does not exist, as the suitable data representation is strongly use case dependent. Many solutions for selecting a suitable data description have been proposed. In most instances, they require data labels and often are black box approaches. Non-expert users have difficulties to comprehend the coherency of input, parameters, and output of these algorithms. Alternative approaches, interactive systems for visual feature selection, overburden the user with an overwhelming set of options and data views. Therefore, it is essential to offer the users a guidance in this analytical process. In this paper, we present a novel system for data description selection, which facilitates the user’s access to the data analysis process. As finding of suitable data description consists of several steps, we support the user with guidance. Our system combines automatic data analysis with interactive visualizations. By this, the system provides a recommendation for suitable data descriptor selections. It supports the comparison of data descriptors with differing dimensionality for unlabeled data. We propose specialized scores and interactive views for descriptor comparison. The visualization techniques are scatterplot-based and grid-based. For the latter case, we apply Self-Organizing Maps as adaptive grids which are well suited for large multi-dimensional data sets. As an example, we demonstrate the usability of our system on a real-world biochemical application.
MotionExplorer : Exploratory Search in Human Motion Capture Data Based on Hierarchical Aggregation
2013-12, Bernard, Jürgen, Wilhelm, Nils, Krüger, Björn, May, Thorsten, Schreck, Tobias, Kohlhammer, Jörn
We present MotionExplorer, an exploratory search and analysis system for sequences of human motion in large motion capture data collections. This special type of multivariate time series data is relevant in many research fields including medicine, sports and animation. Key tasks in working with motion data include analysis of motion states and transitions, and synthesis of motion vectors by interpolation and combination. In the practice of research and application of human motion data, challenges exist in providing visual summaries and drill-down functionality for handling large motion data collections. We find that this domain can benefit from appropriate visual retrieval and analysis support to handle these tasks in presence of large motion data. To address this need, we developed MotionExplorer together with domain experts as an exploratory search system based on interactive aggregation and visualization of motion states as a basis for data navigation, exploration, and search. Based on an overview-first type visualization, users are able to search for interesting sub-sequences of motion based on a query-by-example metaphor, and explore search results by details on demand. We developed MotionExplorer in close collaboration with the targeted users who are researchers working on human motion synthesis and analysis, including a summative field study. Additionally, we conducted a laboratory design study to substantially improve MotionExplorer towards an intuitive, usable and robust design. MotionExplorer enables the search in human motion capture data with only a few mouse clicks. The researchers unanimously confirm that the system can efficiently support their work.
Visual cluster analysis of trajectory data with interactive Kohonen maps
2009, Schreck, Tobias, Bernard, Jürgen, von Landesberger, Tatiana, Kohlhammer, Jörn
Visual-interactive cluster analysis provides valuable tools for effectively analyzing large and complex data sets. Owing to desirable properties and an inherent predisposition for visualization, the Kohonen Feature Map (or Self-Organizing Map or SOM) algorithm is among the most popular and widely used visual clustering techniques. However, the unsupervised nature of the algorithm may be disadvantageous in certain applications. Depending on initialization and data characteristics, cluster maps (cluster layouts) may emerge that do not comply with user preferences, expectations or the application context. Considering SOM-based analysis of trajectory data, we propose a comprehensive visual-interactive monitoring and control framework extending the basic SOM algorithm. The framework implements the general Visual Analytics idea to effectively combine automatic data analysis with human expert supervision. It provides simple, yet effective facilities for visually monitoring and interactively controlling the trajectory clustering process at arbitrary levels of detail. The approach allows the user to leverage existing domain knowledge and user preferences, arriving at improved cluster maps. We apply the framework on several trajectory clustering problems, demonstrating its potential in combining both unsupervised (machine) and supervised (human expert) processing, in producing appropriate cluster results.