Visualization of Large Document Corpora

Cite This

Files in this item

Checksum: MD5:885aecf0d1c85ad0bdc7f6efe8ed3dcf

STROBELT, Hendrik, 2012. Visualization of Large Document Corpora [Dissertation]. Konstanz: University of Konstanz

@phdthesis{Strobelt2012Visua-20847, title={Visualization of Large Document Corpora}, year={2012}, author={Strobelt, Hendrik}, address={Konstanz}, school={Universität Konstanz} }

terms-of-use Visualization of Large Document Corpora Strobelt, Hendrik Documents appear to us regularly in daily life in various designs and lengths to serve different purposes. We are used to read novels, news papers, advertisement flyers, instruction manuals, bus tickets, tube maps, etc. In addition, a lot of professional life is based on browsing through and understanding of documents. Methods to reduce stacks of printed paper on our desks and to allow bigger scalability then an office room would offer are the driving research objects of this thesis. As casual as this vision sounds as profound and manifold are the research question related to it.<br /><br />The thesis at hand covers topics from content acquisition to interaction with visualizations. A compact introduction motivates document visualization from different view points and discusses former efforts. As preliminary for later use, specific methods for content extraction from document files are depicted. Document Cards use this content to represent a documents textual and image highlights as rich representatives of small scale. The cards are intended to be used in larger application to replace dots in collection browsers. For higher abstraction, tag clouds can summarize document collections. How CDTE Tag Clouds can reflect content and context changes of dynamically evolving collections is depicted in the corresponding chapter.<br /><br />A common and important visual variable which is used in all visualizations in this thesis is position. Positions of data representatives can express closeness, reveal groupings, and help building mental maps. When dimensional objects like text snippets or Document Cards represent entities at specific positions, overlap can occur resulting in visual clutter. A review and evaluation on practical methods to remove overlap leads to the invention of Rolled-Out Wordles, a simple but effective method in dense visualization scenarios. The last chapter describes a design study, an interaction paradigm, and challenges of interdisciplinary work. HiTSEE for KNIME allows biochemists to observe structure activity relationship for high throughput screening experiments as integration into the KNIME platform. Although being based on biochemical data and tasks, the fundamental methods for visualization and interaction are applicable to a wide range of systems of large data visualization, including document collection browsing.<br /><br />Finally, a conclusion summarizes insights and describes future work ideas. eng 2012 2012-11-15T10:21:49Z 2012-11-15T10:21:49Z Strobelt, Hendrik

Downloads since Oct 1, 2014 (Information about access statistics)

Diss_Strobelt.pdf 560

This item appears in the following Collection(s)

Search KOPS


My Account