Towards visual debugging for multi-target time series classification
2020, Schlegel, Udo, Cakmak, Eren, Arnout, Hiba, El-Assady, Mennatallah, Oelke, Daniela, Keim, Daniel A.
Multi-target classification of multivariate time series data poses a challenge in many real-world applications (e.g., predictive maintenance). Machine learning methods, such as random forests and neural networks, support training these classifiers. However, the debugging and analysis of possible misclassifications remain challenging due to the often complex relations between targets, classes, and the multivariate time series data. We propose a model-agnostic visual debugging workflow for multi-target time series classification that enables the examination of relations between targets, partially correct predictions, potential confusions, and the classified time series data. The workflow, as well as the prototype, aims to foster an in-depth analysis of multi-target classification results to identify potential causes of mispredictions visually. We demonstrate the usefulness of the workflow in the field of predictive maintenance in a usage scenario to show how users can iteratively explore and identify critical classes, as well as, relationships between targets.
Feature-Based Visual Exploration of Text Classification
2015, Stoffel, Florian, Flekova, Lucie, Oelke, Daniela, Gurevych, Iryna, Keim, Daniel A.
There are many applications of text classification such as gender attribution in market research or the identification of forged product reviews on e-commerce sites. Although several automatic methods provide satisfying performance in most application cases, we see a gap in supporting the analyst to understand the results and derive knowledge for future application scenarios. In this paper, we present a visualization driven application that allows analysts to gain insight in text classification tasks such as sentiment detection or authorship attribution on feature level, built with a practitioner’s way of reasoning in mind, the Text Classification Analysis Process.
Advanced Visual Analytics Methods for Literature Analysis
2012, Oelke, Daniela, Kokkinakis, Dimitrios, Malm, Mats
The volumes of digitized literary collections in various languages increase at a rapid pace, which results also in a growing demand for computational support to analyze such linguistic data. This paper combines robust text analysis with advanced visual analytics and brings a new set of tools to literature analysis. Visual analytics techniques can offer new and unexpected insights and knowledge to the literary scholar. We analyzed a small subset of a large literary collection, the Swedish Literature Bank, by focusing on the extraction of persons’ names, their gender and their normalized, linked form, including mentions of theistic beings (e.g., Gods’ names and mythological figures), and examined their appearance over the course of the novel. A case study based on 13 novels, from the aforementioned collection, shows a number of interesting applications of visual analytics methods to literature problems, where named entities can play a prominent role, demonstrating the advantage of visual literature analysis. Our work is inspired by the notion of distant reading or macroanalysis for the analyses of large literature collections.
Visual readability analysis : How to make your writings easier to read
2010-10, Oelke, Daniela, Spretke, David, Stoffel, Andreas, Keim, Daniel A.
We present a tool that is specifically designed to support a writer in revising a draft-version of a document. In addition to showing which paragraphs and sentences are difficult to read and understand, we assist the reader in understanding why this is the case. This requires features that are expressive predictors of readability, and are also semantically understandable. In the first part of the paper, we therefore discuss a semi-automatic feature selection approach that is used to choose appropriate measures from a collection of 141 candidate readability features. In the second part, we present the visual analysis tool VisRA, which allows the user to analyze the feature values across the text and within single sentences. The user can choose different visual representations accounting for differences in the size of the documents and the availability of information about the physical and logical layout of the documents. We put special emphasis on providing as much transparency as possible to ensure that the user can purposefully improve the readability of a sentence. Several case-studies are presented that show the wide range of applicability of our tool.
Towards A Rigorous Evaluation Of XAI Methods On Time Series
2019-10, Schlegel, Udo, Arnout, Hiba, El-Assady, Mennatallah, Oelke, Daniela, Keim, Daniel A.
Explainable Artificial Intelligence (XAI) methods are typically deployed to explain and debug black-box machine learning models. However, most proposed XAI methods are black-boxes themselves and designed for images. Thus, they rely on visual interpretability to evaluate and prove explanations. In this work, we apply XAI methods previously used in the image and text-domain on time series. We present a methodology to test and evaluate various XAI methods on time series by introducing new verification techniques to incorporate the temporal dimension. We further conduct preliminary experiments to assess the quality of selected XAI method explanations with various verification methods on a range of datasets and inspecting quality metrics on it. We demonstrate that in our initial experiments, SHAP works robust for all models, but others like DeepLIFT, LRP, and Saliency Maps work better with specific architectures.
Visual analytics and the language of web query logs : a terminology perspective
2012, Oelke, Daniela, Eklund, Ann-Marie, Marinov, Svetoslav, Kokkinakis, Dimitrios
This paper explores means to integrate natural language processing methods for terminology and entity identification in medical web session logs with visual analytics techniques. The aim of the study is to examine whether the vocabulary used in queries posted to a Swedish regional health web site can be assessed in a way that will enable a terminologist or medical data analysts to instantly identify new term candidates and their relations based on significant co-occurrence patterns. We provide an example application in order to illustrate how the co-occurrence relationships between medical and general entities occurring in such logs can be visualized, accessed and explored. To enable a visual exploration of the generated co-occurrence graphs, we employ a general purpose social network analysis tool, visone (http://visone.info), that permits to visualize and analyze various types of graph structures. Our examples show that visual analytics based on co-occurrence analysis provides insights into the use of layman language in relation to established (professional) terminologies, which may help terminologists decide which terms to include in future terminologies. Increased understanding of the used querying language is also of interest in the context of public health web sites. The query results should reflect the intentions of the information seekers, who may express themselves in layman language that differs from the one used on the available web sites provided by medical professionals.
Visual analysis of next-generation sequencing data to detect overlapping genes in bacterial genomes
2011-10, Simon, Svenja, Oelke, Daniela, Landstorfer, Richard, Neuhaus, Klaus, Keim, Daniel A.
Next generation sequencing (NGS) technologies are about to revolutionize biological research. Being able to sequence large amounts of DNA or, indirectly, RNA sequences in a short time period opens numerous new possibilities. However, analyzing the large amounts of data generated in NGS is a serious challenge, which requires novel data analysis and visualization methods to allow the biological experimenter to understand the results. In this paper, we describe a novel system to deal with the flood of data generated by transcriptome sequencing (RNA-seq) using NGS. Our system allows the analyzer to get a quick overview of the data and interactively explore interesting regions based on the three important parameters coverage, transcription, and fit. In particular, our system supports the NGS analysis in the following respects: (1) Representation of the coverage sequence in a way that no artifacts are introduced. (2) Easy determination of a fit of an open reading frame (ORF) to a transcript by mapping the coverage sequence directly into the ORF representation. (3) Providing automatic support for finding interesting regions to address the problems that the overwhelming volume of data comes with. (4) Providing an overview representation that allows parameter tuning and enables quick access to interesting areas of the genome. We show the usefulness of our system by a case study in the area of overlapping gene detection in a bacterial genome.
Lessons on Combining Topology and Geography : Visual Analytics for Electrical Outage Management
2016, Jäger, Alexander, Mittelstädt, Sebastian, Oelke, Daniela, Sander, Sonja, Platz, Axel, Bouwman, Gies, Keim, Daniel A.
Outage management in electrical networks is a complex task for operators and requires comprehensive overviews of the topology. At the same time valuable information for detecting the root cause may have geographical context such as digging activities or falling trees. Consequently, vendors of state-of-the-art SCADA systems started to integrate this valuable information source as well. However, in todays systems both views are separated, requiring operators to mentally connect the geographical and topological information. The wish of operators is to provide a comprehensive combination of both spaces in a single view. However, how to project geographical elements into the topology to support the workflow of real operators is yet unclear. In this paper, we present a design study for an interactive visualization system that provides a comprehensive overview for power grid operators. It provides full coverage of both spaces in order to measure how real operators make use of the geographical information. It bypasses the projection problem by interactive brushing-and-linking to support associative analysis. We extracted the mental-model of domain experts in real use cases and found a general bias source in sequential analysis of two spaces. We contribute our problem and task abstraction, lessons learned, and implications for future research.
Visual Analysis of Explicit Opinion and News Bias in German Soccer Articles
2012, Oelke, Daniela, Geißelmann, Benno, Keim, Daniel A.
Most state-of-the-art opinion and sentiment analysis techniques were developed for customer feedback data or reviews. Applying them to another domain is often not possible because the algorithms are based on the assumption that the opinions are expressed explicitly in the text. However, news articles, for instance, convey an opinion in a more subtle manner. In this work we analyze German soccer articles with respect to the sentiment that is expressed in them. Besides adapting conventional sentiment analysis algorithms to the specific domain, we also investigate what can be measured with these techniques and what should be measured on news articles according to communication scientists. We suggest to bridge the existing gap with visual analytics methods and demonstrate the usability of the techniques on a concrete application example.
Real-Time Visualization of Streaming Text Data : Tasks and Challenges
2011, Rohrdantz, Christian, Oelke, Daniela, Krstajic, Milos, Fischer, Fabian
Real-time visualization of text streams is crucial for different analysis scenarios and can be expected to become one of the important future research topics in the text visualization domain. Especially the complex requirements of real-time text analysis tasks lead to new visualization challenges, which will be structured and described in this paper. First, we give a definition of what we consider to be a text stream and emphasize the importance of different real-world application scenarios. Then, we summarize research challenges related to different parts of the analysis process and identify those challenges that are exclusive to real-time streaming text visualization. We review related work with respect to the question which of the challenges have been addressed in the past and what solutions have been suggested. Finally, we identify the open issues and potential future research subjects in this vibrant area.