FDive : Learning Relevance Models Using Pattern-based Similarity Measures
2019, Dennig, Frederik L., Polk, Tom, Lin, Zudi, Schreck, Tobias, Pfister, Hanspeter, Behrisch, Michael
The detection of interesting patterns in large high-dimensional datasets is difficult because of their dimensionality and pattern complexity. Therefore, analysts require automated support for the extraction of relevant patterns. In this paper, we present FDive, a visual active learning system that helps to create visually explorable relevance models, assisted by learning a pattern-based similarity. We use a small set of user-provided labels to rank similarity measures, consisting of feature descriptor and distance function combinations, by their ability to distinguish relevant from irrelevant data. Based on the best-ranked similarity measure, the system calculates an interactive Self-Organizing Map-based relevance model, which classifies data according to the cluster affiliation. It also automatically prompts further relevance feedback to improve its accuracy. Uncertain areas, especially near the decision boundaries, are highlighted and can be refined by the user. We evaluate our approach by comparison to state-of-the-art feature selection techniques and demonstrate the usefulness of our approach by a case study classifying electron microscopy images of brain cells. The results show that FDive enhances both the quality and understanding of relevance models and can thus lead to new insights for brain research.
Subspace Nearest Neighbor Search : Problem Statement, Approaches, and Discussion
2015, Blumenschein, Michael, Behrisch, Michael, Färber, Ines, Sedlmair, Michael, Schreck, Tobias, Seidl, Thomas, Keim, Daniel A.
Computing the similarity between objects is a central task for many applications in the field of information retrieval and data mining. For finding k-nearest neighbors, typically a ranking is computed based on a predetermined set of data dimensions and a distance function, constant over all possible queries. However, many high-dimensional feature spaces contain a large number of dimensions, many of which may contain noise, irrelevant, redundant, or contradicting information. More specifically, the relevance of dimensions may depend on the query object itself, and in general, different dimension sets (subspaces) may be appropriate for a query. Approaches for feature selection or -weighting typically provide a global subspace selection, which may not be suitable for all possibly queries. In this position paper, we frame a new research problem, called subspace nearest neighbor search, aiming at multiple query-dependent subspaces for nearest neighbor search. We describe relevant problem characteristics, relate to existing approaches, and outline potential research directions.
Guided Sketching for Visual Search and Exploration in Large Scatter Plot Spaces
2014, Shao, Lin, Behrisch, Michael, Schreck, Tobias, Landesberger, Tatjana von, Scherer, Maximilian, Bremm, Sebastian, Keim, Daniel A.
Recently, there has been an interest in methods for filtering large scatter plot spaces for interesting patterns. However, user interaction remains crucial in starting an explorative analysis in a large scatter plot space. We introduce an approach for explorative search and navigation in large sets of scatter plot diagrams. By means of a sketch-based query interface, users can start the exploration process by providing a visual example of the pattern they are interested in. A shadow-drawing approach provides suggestions for possibly relevant patterns while query drawing takes place, supporting the visual search process. We apply the approach on a large real-world data set, demonstrating the principal functionality and usefulness of our technique.
Feedback-driven interactive exploration of large multidimensional data supported by visual classifier
2014, Behrisch, Michael, Korkmaz, Fatih, Shao, Lin, Schreck, Tobias
The extraction of relevant and meaningful information from multivariate or high-dimensional data is a challenging problem. One reason for this is that the number of possible representations, which might contain relevant information, grows exponentially with the amount of data dimensions. Also, not all views from a possibly large view space, are potentially relevant to a given analysis task or user. Focus+Context or Semantic Zoom Interfaces can help to some extent to efficiently search for interesting views or data segments, yet they show scalability problems for very large data sets. Accordingly, users are confronted with the problem of identifying interesting views, yet the manual exploration of the entire view space becomes ineffective or even infeasible. While certain quality metrics have been proposed recently to identify potentially interesting views, these often are defined in a heuristic way and do not take into account the application or user context. We introduce a framework for a feedback-driven view exploration, inspired by relevance feedback approaches used in Information Retrieval. Our basic idea is that users iteratively express their notion of interestingness when presented with candidate views. From that expression, a model representing the user's preferences, is trained and used to recommend further interesting view candidates. A decision support system monitors the exploration process and assesses the relevance-driven search process for convergence and stability. We present an instantiation of our framework for exploration of Scatter Plot Spaces based on visual features. We demonstrate the effectiveness of this implementation by a case study on two real-world datasets. We also discuss our framework in light of design alternatives and point out its usefulness for development of user- and context-dependent visual exploration systems.
Pattern Trails : Visual Analysis of Pattern Transitions in Subspaces
2017, Jäckle, Dominik, Blumenschein, Michael, Behrisch, Michael, Keim, Daniel A., Schreck, Tobias
Subspace analysis methods have gained interest for identifying patterns in subspaces of high-dimensional data. Existing techniques allow to visualize and compare patterns in subspaces. However, many subspace analysis methods produce an abundant amount of patterns, which often remain redundant and are difficult to relate. Creating effective layouts for comparison of subspace patterns remains challenging. We introduce Pattern Trails, a novel approach for visually ordering and comparing subspace patterns. Central to our approach is the notion of pattern transitions as an interpretable structure imposed to order and compare patterns between subspaces. The basic idea is to visualize projections of subspaces side-by-side, and indicate changes between adjacent patterns in the subspaces by a linked representation, hence introducing pattern transitions. Our contributions comprise a systematization for how pairs of subspace patterns can be compared, and how changes can be interpreted in terms of pattern transitions. We also contribute a technique for visual subspace analysis based on a data-driven similarity measure between subspace representations. This measure is useful to order the patterns, and interactively group subspaces to reduce redundancy. We demonstrate the usefulness of our approach by application to several use cases, indicating that data can be meaningfully ordered and interpreted in terms of pattern transitions.
Guiding the Exploration of Scatter Plot Data Using Motif-Based Interest Measures
2015, Shao, Lin, Schleicher, Timo, Behrisch, Michael, Schreck, Tobias, Sipiran, Ivan, Keim, Daniel A.
Finding interesting patterns in large scatter plot spaces is a challenging problem and becomes even more difficult with increasing number of dimensions. Previous approaches for exploring large scatter plot spaces like e.g., the well-known Scagnostics approach, mainly focus on ranking scatter plots based on their global properties. However, often local patterns contribute significantly to the interestingness of a scatter plot. We are proposing a novel approach for the automatic determination of interesting views in scatter plot spaces based on analysis of local scatter plot segments. Specifically, we automatically classify similar local scatter plot segments, which we call scatter plot motifs. Inspired by the well-known tf-idf approach from information retrieval, we compute local and global quality measures based on certain frequency properties of the local motifs. We show how we can use these to filter, rank and compare scatter plots and their incorporated motifs. We demonstrate the usefulness of our approach with synthetic and real-world data sets and showcase our corresponding data exploration tool that visualizes the distribution of local scatter plot motifs in relation to a large overall scatter plot space.
Identifying Locally Interesting Motifs for Exploration of Scatter Plot Matrices
2014, Shao, Lin, Behrisch, Michael, Schreck, Tobias, Sipiran, Ivan, Kwon, Bum Chul, Keim, Daniel A.
Scatter plots are effective diagrams to visualize distributions, clusters and correlations in two-dimensional data space. For highdimensional data, scatter plot matrices can be formed to show all two-dimensional combinations of dimensions. Several previous approaches for exploration of large scatter plot spaces have focused on ranking and sorting scatter plot matrices based on global patterns. However, often local patterns are of interest for scatter plot exploration. We present a preliminary idea to explore the scatter plot space by identifying significant local patterns (also called motifs in this work). Based on certain clustering algorithms and image-based descriptors, we identify and group a set of similar local candidate motifs in a large scatter plot space.
Visual Quality Assessment of Subspace Clusterings
2016, Blumenschein, Michael, Färber, Ines, Behrisch, Michael, Tatu, Andrada, Schreck, Tobias, Keim, Daniel A., Seidl, Thomas
The quality assessment of results of clustering algorithms is challenging as different cluster methodologies lead to different cluster characteristics and topologies. A further complication is that in high-dimensional data, subspace clustering adds to the complexity by detecting clusters in multiple different lower-dimensional projections. The quality assessment for (subspace) clustering is especially difficult if no benchmark data is available to compare the clustering results. In this research paper, we present SubEval, a novel subspace evaluation framework, which provides visual support for comparing quality criteria of subspace clusterings. We identify important aspects for evaluation of subspace clustering results and show how our system helps to derive quality assessments. SubEval allows assessing subspace cluster quality at three different granularity levels: (1) A global overview of similarity of clusters and estimated redundancy in cluster members and subspace dimensions. (2) A view of a selection of multiple clusters supports in-depth analysis of object distributions and potential cluster overlap. (3) The detail analysis of characteristics of individual clusters helps to understand the (non-)validity of a cluster. We demonstrate the usefulness of SubEval in two case studies focusing on the targeted algorithm- and domain scientists and show how the generated insights lead to a justified selection of an appropriate clustering algorithm and an improved parameter setting. Likewise, SubEval can be used for the understanding and improvement of newly developed subspace clustering algorithms. SubEval is part of SubVA, a novel open-source web-based framework for the visual analysis of different subspace analysis techniques.
The Visual Exploration of Aggregate Similarity for Multi-Dimensional Clustering
2015, Twellmeyer, James, Hutter, Marco, Behrisch, Michael, Kohlhammer, Jörn, Schreck, Tobias
We present a visualisation prototype for the support of a novel approach to clustering called TRIAGE. TRIAGE uses aggregation functions which are more adaptable and flexible than the weighted mean for similarity modelling. While TRIAGE has proven itself in practice, the use of complex similarity models makes the interpretation of TRIAGE clusterings challenging. We address this challenge by providing analysts with a linked, matrix-based visualisation of all relevant data attributes. We employ data sampling and matrix seriation to support both effective overviews and fluid, interactive exploration using the same visual metaphor for heterogeneous attributes. The usability of our prototype is demonstrated and assessed with the help of real-world usage scenarios from the cyber-security domain.
Quality Metrics Driven Approach to Visualize Multidimensional Data in Scatterplot Matrix
2014, Behrisch, Michael, Shao, Lin, Kwon, Bum Chul, Schreck, Tobias, Sipiran, Ivan, Keim, Daniel A.
Extracting meaningful information out of vast amounts of highdimensional data is very difficult. Prior research studies have been trying to solve these problems through either automatic data analysis or interactive visualization approaches. Our grand goal is to derive the representative and generalizable quality metrics and to apply the metrics to amplify interesting patterns as well as to mute the uninteresting noise for multidimensional visualizations. In this particular poster, we investigate quality metrics driven approach to achieve the goal for scatterplot matrix (SPLOM). Our main approach is to rearrange scatterplot matrices by sorting scatterplots based upon their patterns especially locally significant ones, called scatterplot motifs. Using the approach, we expect scatterplot matrices to reveal groups of visual patterns appearing adjacent to each other, which helps analysts to gain a clear overview and to delve into specific areas of interest more easily. Our ongoing investigation aims to test and refine the feature vector for scatterplot motifs depending upon data sizes and the number of dimensions.