Visual Analytics of Co-Occurrences to Discover Subspaces in Structured Data
2023, Jentner, Wolfgang, Lindholz, Giuliana, Schäfer, Hanna, El-Assady, Mennatallah, Ma, Kwan-Liu, Keim, Daniel A.
We present an approach that shows all relevant subspaces of categorical data condensed in a single picture. We model the categorical values of the attributes as co-occurrences with data partitions generated from structured data using pattern mining. We show that these co-occurrences are a-priori allowing us to greatly reduce the search space effectively generating the condensed picture where conventional approaches filter out several subspaces as these are deemed insignificant. The task of identifying interesting subspaces is common but difficult due to exponential search spaces and the curse of dimensionality. One application of such a task might be identifying a cohort of patients defined by attributes such as gender, age, and diabetes type that share a common patient history, which is modeled as event sequences. Filtering the data by these attributes is common but cumbersome and often does not allow a comparison of subspaces. We contribute a powerful multi-dimensional pattern exploration approach (MDPE-approach) agnostic to the structured data type that models multiple attributes and their characteristics as co-occurrences, allowing the user to identify and compare thousands of subspaces of interest in a single picture. In our MDPE-approach, we introduce two methods to dramatically reduce the search space, outputting only the boundaries of the search space in the form of two tables. We implement the MDPE-approach in an interactive visual interface (MDPE-vis) that provides a scalable, pixel-based visualization design allowing the identification, comparison, and sense-making of subspaces in structured data. Our case studies using a gold-standard dataset and external domain experts confirm our approach’s and implementation’s applicability. A third use case sheds light on the scalability of our approach and a user study with 15 participants underlines its usefulness and power.
PRIMAGE project : predictive in silico multiscale analytics to support childhood cancer personalised evaluation empowered by imaging biomarkers
2020-04-03, Martí-Bonmatí, Luis, Alberich-Bayarri, Ángel, Ladenstein, Ruth, Blanquer, Ignacio, Segrelles, J. Damian, Cerdá-Alberich, Leonor, Gkontra, Polyxeni, Hero, Barbara, Keim, Daniel A., Jentner, Wolfgang
PRIMAGE is one of the largest and more ambitious research projects dealing with medical imaging, artificial intelligence and cancer treatment in children. It is a 4-year European Commission-financed project that has 16 European partners in the consortium, including the European Society for Paediatric Oncology, two imaging biobanks, and three prominent European paediatric oncology units. The project is constructed as an observational in silico study involving high-quality anonymised datasets (imaging, clinical, molecular, and genetics) for the training and validation of machine learning and multiscale algorithms. The open cloud-based platform will offer precise clinical assistance for phenotyping (diagnosis), treatment allocation (prediction), and patient endpoints (prognosis), based on the use of imaging biomarkers, tumour growth simulation, advanced visualisation of confidence scores, and machine-learning approaches. The decision support prototype will be constructed and validated on two paediatric cancers: neuroblastoma and diffuse intrinsic pontine glioma. External validation will be performed on data recruited from independent collaborative centres. Final results will be available for the scientific community at the end of the project, and ready for translation to other malignant solid tumours.
QuestionComb : A Gamification Approach for the Visual Explanation of Linguistic Phenomena through Interactive Labeling
2021, Sevastjanova, Rita, Jentner, Wolfgang, Sperrle, Fabian, Kehlbeck, Rebecca, Bernard, Jürgen, El-Assady, Mennatallah
Linguistic insight in the form of high-level relationships and rules in text builds the basis of our understanding of language. However, the data-driven generation of such structures often lacks labeled resources that can be used as training data for supervised machine learning. The creation of such ground-truth data is a time-consuming process that often requires domain expertise to resolve text ambiguities and characterize linguistic phenomena. Furthermore, the creation and refinement of machine learning models is often challenging for linguists as the models are often complex, in-transparent, and difficult to understand. To tackle these challenges, we present a visual analytics technique for interactive data labeling that applies concepts from gamification and explainable Artificial Intelligence (XAI) to support complex classification tasks. The visual-interactive labeling interface promotes the creation of effective training data. Visual explanations of learned rules unveil the decisions of the machine learning model and support iterative and interactive optimization. The gamification-inspired design guides the user through the labeling process and provides feedback on the model performance. As an instance of the proposed technique, we present QuestionComb, a workspace tailored to the task of question classification (i.e., in information-seeking vs. non-information-seeking questions). Our evaluation studies confirm that gamification concepts are beneficial to engage users through continuous feedback, offering an effective visual analytics technique when combined with active learning and XAI.
Making machine intelligence less scary for criminal analysts : reflections on designing a visual comparative case analysis tool
2018-09, Jentner, Wolfgang, Sacha, Dominik, Stoffel, Florian, Ellis, Geoffrey, Zhang, Leishi, Keim, Daniel A.
A fundamental task in criminal intelligence analysis is to analyze the similarity of crime cases, called comparative case analysis (CCA), to identify common crime patterns and to reason about unsolved crimes. Typically, the data are complex and high dimensional and the use of complex analytical processes would be appropriate. State-of-the-art CCA tools lack flexibility in interactive data exploration and fall short of computational transparency in terms of revealing alternative methods and results. In this paper, we report on the design of the Concept Explorer, a flexible, transparent and interactive CCA system. During this design process, we observed that most criminal analysts are not able to understand the underlying complex technical processes, which decrease the users’ trust in the results and hence a reluctance to use the tool. Our CCA solution implements a computational pipeline together with a visual platform that allows the analysts to interact with each stage of the analysis process and to validate the result. The proposed visual analytics workflow iteratively supports the interpretation of the results of clustering with the respective feature relations, the development of alternative models, as well as cluster verification. The visualizations offer an understandable and usable way for the analyst to provide feedback to the system and to observe the impact of their interactions. Expert feedback confirmed that our user-centered design decisions made this computational complexity less scary to criminal analysts.
Toward Mass Video Data Analysis : Interactive and Immersive 4D Scene Reconstruction
2020-09-22, Kraus, Matthias, Pollok, Thomas, Miller, Matthias, Kilian, Timon, Moritz, Tobias, Schweitzer, Daniel, Beyerer, Jürgen, Keim, Daniel A., Qu, Chengchao, Jentner, Wolfgang
The technical progress in the last decades makes photo and video recording devices omnipresent. This change has a significant impact, among others, on police work. It is no longer unusual that a myriad of digital data accumulates after a criminal act, which must be reviewed by criminal investigators to collect evidence or solve the crime. This paper presents the VICTORIA Interactive 4D Scene Reconstruction and Analysis Framework ("ISRA-4D" 1.0), an approach for the visual consolidation of heterogeneous video and image data in a 3D reconstruction of the corresponding environment. First, by reconstructing the environment in which the materials were created, a shared spatial context of all available materials is established. Second, all footage is spatially and temporally registered within this 3D reconstruction. Third, a visualization of the hereby created 4D reconstruction (3D scene + time) is provided, which can be analyzed interactively. Additional information on video and image content is also extracted and displayed and can be analyzed with supporting visualizations. The presented approach facilitates the process of filtering, annotating, analyzing, and getting an overview of large amounts of multimedia material. The framework is evaluated using four case studies which demonstrate its broad applicability. Furthermore, the framework allows the user to immerse themselves in the analysis by entering the scenario in virtual reality. This feature is qualitatively evaluated by means of interviews of criminal investigators and outlines potential benefits such as improved spatial understanding and the initiation of new fields of application.
Integrated visual analysis of patterns in time series and text data : Workflow and application to financial data analysis
2016-01-01, Wanner, Franz, Jentner, Wolfgang, Schreck, Tobias, Stoffel, Andreas, Sharalieva, Lyubka, Keim, Daniel A.
In this article, we describe a workflow and tool that allows a flexible formation of hypotheses about text features and their combinations, which are significantly connected in time to quantitative phenomena observed in stock data. To support such an analysis, we combine the analysis steps of frequent quantitative and text-oriented data using an existing a priori method. First, based on heuristics, we extract interesting intervals and patterns in large time series data. The visual analysis supports the analyst in exploring parameter combinations and their results. The identified time series patterns are then input for the second analysis step, in which all identified intervals of interest are analyzed for frequent patterns co-occurring with financial news. An a priori method supports the discovery of such sequential temporal patterns. Then, various text features such as the degree of sentence nesting, noun phrase complexity, and the vocabulary richness, are extracted from the news items to obtain meta-patterns. Meta-patterns are defined by a specific combination of text features which significantly differ from the text features of the remaining news data. Our approach combines a portfolio of visualization and analysis techniques, including time, cluster, and sequence visualization and analysis functionality. We provide a case study and an evaluation on financial data where we identify important future work. The workflow could be generalized to other application domains such as data analysis of smart grids, cyber physical systems, or the security of critical infrastructure, where the data consist of a combination of quantitative and textual time series data.