Tools and Methods for Processing and Visualizing Large Corpora

Cite This

Files in this item

Files Size Format View

There are no files associated with this item.

SCHNEIDER, Gerold, Mennatallah EL-ASSADY, Hans Martin LEHMANN, 2017. Tools and Methods for Processing and Visualizing Large Corpora. In: HILTUNEN, Turo, ed. and others. Big and Rich Data in English Corpus Linguistics : Methods and Explorations. Helsinki:VARIENG, University of Helsinki

@incollection{Schneider2017Tools-45081, title={Tools and Methods for Processing and Visualizing Large Corpora}, url={}, year={2017}, number={19}, address={Helsinki}, publisher={VARIENG, University of Helsinki}, series={Studies in Variation, Contacts and Change in English}, booktitle={Big and Rich Data in English Corpus Linguistics : Methods and Explorations}, editor={Hiltunen, Turo}, author={Schneider, Gerold and El-Assady, Mennatallah and Lehmann, Hans Martin} }

<rdf:RDF xmlns:dcterms="" xmlns:dc="" xmlns:rdf="" xmlns:bibo="" xmlns:dspace="" xmlns:foaf="" xmlns:void="" xmlns:xsd="" > <rdf:Description rdf:about=""> <dcterms:isPartOf rdf:resource=""/> <dcterms:issued>2017</dcterms:issued> <foaf:homepage rdf:resource="http://localhost:8080/jspui"/> <dcterms:available rdf:datatype="">2019-02-18T15:11:09Z</dcterms:available> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <bibo:uri rdf:resource=""/> <dc:contributor>Lehmann, Hans Martin</dc:contributor> <dc:creator>El-Assady, Mennatallah</dc:creator> <dcterms:title>Tools and Methods for Processing and Visualizing Large Corpora</dcterms:title> <dc:creator>Schneider, Gerold</dc:creator> <dc:date rdf:datatype="">2019-02-18T15:11:09Z</dc:date> <dcterms:abstract xml:lang="eng">We present several approaches and methods which we develop or use to create workflows from data to evidence. They start with looking for specific items in large corpora, exploring overuse of particular items, and using off-the-shelf visualization such as GoogleViz. Second, we present the advanced visualization tools and pipelines which the Visualization Group at University of Konstanz is developing. After an overview, we apply statistical visualizations, Lexical Episode Plots and Interactive Hierarchical Modeling to the vast historical linguistics data offered by the Corpus of Historical American English (COHA), which ranges from 1800 to 2000. We investigate on the one hand the increase of noun compounds and visually illustrate correlations in the data over time. On the other hand we compute and visualize trends and topics in society from 1800 to 2000. We apply an incremental topic modeling algorithm to the extracted compound nouns to detect thematic changes throughout the investigated time period of 200 years. In this paper, we utilize various tailored analysis and visualization approaches to gain insight into the data from different perspectives.</dcterms:abstract> <dcterms:isPartOf rdf:resource=""/> <dc:creator>Lehmann, Hans Martin</dc:creator> <dspace:isPartOfCollection rdf:resource=""/> <dc:contributor>Schneider, Gerold</dc:contributor> <dc:language>eng</dc:language> <dc:contributor>El-Assady, Mennatallah</dc:contributor> <dspace:isPartOfCollection rdf:resource=""/> </rdf:Description> </rdf:RDF>

This item appears in the following Collection(s)

Search KOPS


My Account