Publikation: Tools and Methods for Processing and Visualizing Large Corpora
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
Internationale Patentnummer
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
We present several approaches and methods which we develop or use to create workflows from data to evidence. They start with looking for specific items in large corpora, exploring overuse of particular items, and using off-the-shelf visualization such as GoogleViz. Second, we present the advanced visualization tools and pipelines which the Visualization Group at University of Konstanz is developing. After an overview, we apply statistical visualizations, Lexical Episode Plots and Interactive Hierarchical Modeling to the vast historical linguistics data offered by the Corpus of Historical American English (COHA), which ranges from 1800 to 2000. We investigate on the one hand the increase of noun compounds and visually illustrate correlations in the data over time. On the other hand we compute and visualize trends and topics in society from 1800 to 2000. We apply an incremental topic modeling algorithm to the extracted compound nouns to detect thematic changes throughout the investigated time period of 200 years. In this paper, we utilize various tailored analysis and visualization approaches to gain insight into the data from different perspectives.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
SCHNEIDER, Gerold, Mennatallah EL-ASSADY, Hans Martin LEHMANN, 2017. Tools and Methods for Processing and Visualizing Large Corpora. In: HILTUNEN, Turo, ed. and others. Big and Rich Data in English Corpus Linguistics : Methods and Explorations. Helsinki: VARIENG, University of Helsinki, 2017. Studies in Variation, Contacts and Change in English. 19BibTex
@incollection{Schneider2017Tools-45081, year={2017}, title={Tools and Methods for Processing and Visualizing Large Corpora}, url={http://www.helsinki.fi/varieng/series/volumes/19/schneider_el-assady_lehmann/}, number={19}, publisher={VARIENG, University of Helsinki}, address={Helsinki}, series={Studies in Variation, Contacts and Change in English}, booktitle={Big and Rich Data in English Corpus Linguistics : Methods and Explorations}, editor={Hiltunen, Turo}, author={Schneider, Gerold and El-Assady, Mennatallah and Lehmann, Hans Martin} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45081"> <dc:creator>Schneider, Gerold</dc:creator> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/45081"/> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/> <dc:contributor>Lehmann, Hans Martin</dc:contributor> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/> <dcterms:title>Tools and Methods for Processing and Visualizing Large Corpora</dcterms:title> <dc:language>eng</dc:language> <dcterms:abstract xml:lang="eng">We present several approaches and methods which we develop or use to create workflows from data to evidence. They start with looking for specific items in large corpora, exploring overuse of particular items, and using off-the-shelf visualization such as GoogleViz. Second, we present the advanced visualization tools and pipelines which the Visualization Group at University of Konstanz is developing. After an overview, we apply statistical visualizations, Lexical Episode Plots and Interactive Hierarchical Modeling to the vast historical linguistics data offered by the Corpus of Historical American English (COHA), which ranges from 1800 to 2000. We investigate on the one hand the increase of noun compounds and visually illustrate correlations in the data over time. On the other hand we compute and visualize trends and topics in society from 1800 to 2000. We apply an incremental topic modeling algorithm to the extracted compound nouns to detect thematic changes throughout the investigated time period of 200 years. In this paper, we utilize various tailored analysis and visualization approaches to gain insight into the data from different perspectives.</dcterms:abstract> <dc:contributor>El-Assady, Mennatallah</dc:contributor> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2019-02-18T15:11:09Z</dc:date> <dc:creator>El-Assady, Mennatallah</dc:creator> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:creator>Lehmann, Hans Martin</dc:creator> <dc:contributor>Schneider, Gerold</dc:contributor> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2019-02-18T15:11:09Z</dcterms:available> <dcterms:issued>2017</dcterms:issued> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> </rdf:Description> </rdf:RDF>