Guidelines for Effective Usage of Text Highlighting Techniques

2016, Strobelt, Hendrik, Oelke, Daniela, Kwon, Bum Chul, Schreck, Tobias, Pfister, Hanspeter

Semi-automatic text analysis involves manual inspection of text. Often, different text annotations (like part-of-speech or named entities) are indicated by using distinctive text highlighting techniques. In typesetting there exist well-known formatting conventions, such as bold typeface, italics, or background coloring, that are useful for highlighting certain parts of a given text. Also, many advanced techniques for visualization and highlighting of text exist; yet, standard typesetting is common, and the effects of standard typesetting on the perception of text are not fully understood. As such, we surveyed and tested the effectiveness of common text highlighting techniques, both individually and in combination, to discover how to maximize pop-out effects while minimizing visual interference between techniques. To validate our findings, we conducted a series of crowdsourced experiments to determine: i) a ranking of nine commonly-used text highlighting techniques; ii) the degree of visual interference between pairs of text highlighting techniques; iii) the effectiveness of techniques for visual conjunctive search. Our results show that increasing font size works best as a single highlighting technique, and that there are significant visual interferences between some pairs of highlighting techniques. We discuss the pros and cons of different combinations as a design guideline to choose text highlighting techniques for text viewers.

Visual Analysis of Explicit Opinion and News Bias in German Soccer Articles

2012, Oelke, Daniela, Geißelmann, Benno, Keim, Daniel A.

Most state-of-the-art opinion and sentiment analysis techniques were developed for customer feedback data or reviews. Applying them to another domain is often not possible because the algorithms are based on the assumption that the opinions are expressed explicitly in the text. However, news articles, for instance, convey an opinion in a more subtle manner. In this work we analyze German soccer articles with respect to the sentiment that is expressed in them. Besides adapting conventional sentiment analysis algorithms to the specific domain, we also investigate what can be measured with these techniques and what should be measured on news articles according to communication scientists. We suggest to bridge the existing gap with visual analytics methods and demonstrate the usability of the techniques on a concrete application example.

Finding Correlations in Functionally Equivalent Proteins by Integrating Automated and Visual Data Exploration

2006-10, Keim, Daniel A., Oelke, Daniela, Truman, Royal, Neuhaus, Klaus

The analysis of alignments of functionally equivalent proteins can reveal regularities such as correlated positions or residue patterns which are important to ensure a specific fold and various cellular functions. Many approaches are found in the literature which try to identify correlated positions to predict the residues that are close to each other in the three-dimensional folded structure. However, the quality of the predictions remains disappointing. One of the problems is that the statistical correlation measures that were used cannot do justice to the underlying complex biological and physicochemical realities. In this paper we evaluate the biological requirements for a correlation measure and explain why a completely automatic approach is unlikely to succeed. We then propose a novel and flexible criteria for correlation of residue positions in protein sequences, which can be optimized for different requirements. To apply this definition we developed the tool VisAlign that combines an automatic calculation of correlations with an interactive visualization. This allows the user to visually explore alternative alignments and thereby conveniently test various hypothesis and to detect regularities in the aligned sequences

Visual readability analysis : how to make your writings easier to read

2012-05, Oelke, Daniela, Spretke, David, Stoffel, Andreas, Keim, Daniel A.

We present a tool that is specifically designed to support a writer in revising a draft version of a document. In addition to showing which paragraphs and sentences are difficult to read and understand, we assist the reader in understanding why this is the case. This requires features that are expressive predictors of readability, and are also semantically understandable. In the first part of the paper, we, therefore, discuss a semiautomatic feature selection approach that is used to choose appropriate measures from a collection of 141 candidate readability features. In the second part, we present the visual analysis tool VisRA, which allows the user to analyze the feature values across the text and within single sentences. Users can choose between different visual representations accounting for differences in the size of the documents and the availability of information about the physical and logical layout of the documents. We put special emphasis on providing as much transparency as possible to ensure that the user can purposefully improve the readability of a sentence. Several case studies are presented that show the wide range of applicability of our tool. Furthermore, an in-depth evaluation assesses the quality of the measure and investigates how well users do in revising a text with the help of the tool.

Advanced Visual Analytics Methods for Literature Analysis

2012, Oelke, Daniela, Kokkinakis, Dimitrios, Malm, Mats

The volumes of digitized literary collections in various languages increase at a rapid pace, which results also in a growing demand for computational support to analyze such linguistic data. This paper combines robust text analysis with advanced visual analytics and brings a new set of tools to literature analysis. Visual analytics techniques can offer new and unexpected insights and knowledge to the literary scholar. We analyzed a small subset of a large literary collection, the Swedish Literature Bank, by focusing on the extraction of persons’ names, their gender and their normalized, linked form, including mentions of theistic beings (e.g., Gods’ names and mythological figures), and examined their appearance over the course of the novel. A case study based on 13 novels, from the aforementioned collection, shows a number of interesting applications of visual analytics methods to literature problems, where named entities can play a prominent role, demonstrating the advantage of visual literature analysis. Our work is inspired by the notion of distant reading or macroanalysis for the analyses of large literature collections.

Visual analytics and the language of web query logs : a terminology perspective

2012, Oelke, Daniela, Eklund, Ann-Marie, Marinov, Svetoslav, Kokkinakis, Dimitrios

This paper explores means to integrate natural language processing methods for terminology and entity identification in medical web session logs with visual analytics techniques. The aim of the study is to examine whether the vocabulary used in queries posted to a Swedish regional health web site can be assessed in a way that will enable a terminologist or medical data analysts to instantly identify new term candidates and their relations based on significant co-occurrence patterns. We provide an example application in order to illustrate how the co-occurrence relationships between medical and general entities occurring in such logs can be visualized, accessed and explored. To enable a visual exploration of the generated co-occurrence graphs, we employ a general purpose social network analysis tool, visone (http://visone.info), that permits to visualize and analyze various types of graph structures. Our examples show that visual analytics based on co-occurrence analysis provides insights into the use of layman language in relation to established (professional) terminologies, which may help terminologists decide which terms to include in future terminologies. Increased understanding of the used querying language is also of interest in the context of public health web sites. The query results should reflect the intentions of the information seekers, who may express themselves in layman language that differs from the one used on the available web sites provided by medical professionals.

Methods for interactive exploration of large-scale news streams

2010, Keim, Daniel A., Krstajic, Milos, Bak, Peter, Oelke, Daniela, Mansmann, Florian