Publikation:

Tools and Methods for Processing and Visualizing Large Corpora

Lade...
Vorschaubild

Dateien

Zu diesem Dokument gibt es keine Dateien.

Datum

2017

Autor:innen

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

URI (zitierfähiger Link)
DOI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Sammelband
Publikationsstatus
Published

Erschienen in

HILTUNEN, Turo, ed. and others. Big and Rich Data in English Corpus Linguistics : Methods and Explorations. Helsinki: VARIENG, University of Helsinki, 2017. Studies in Variation, Contacts and Change in English. 19

Zusammenfassung

We present several approaches and methods which we develop or use to create workflows from data to evidence. They start with looking for specific items in large corpora, exploring overuse of particular items, and using off-the-shelf visualization such as GoogleViz. Second, we present the advanced visualization tools and pipelines which the Visualization Group at University of Konstanz is developing. After an overview, we apply statistical visualizations, Lexical Episode Plots and Interactive Hierarchical Modeling to the vast historical linguistics data offered by the Corpus of Historical American English (COHA), which ranges from 1800 to 2000. We investigate on the one hand the increase of noun compounds and visually illustrate correlations in the data over time. On the other hand we compute and visualize trends and topics in society from 1800 to 2000. We apply an incremental topic modeling algorithm to the extracted compound nouns to detect thematic changes throughout the investigated time period of 200 years. In this paper, we utilize various tailored analysis and visualization approaches to gain insight into the data from different perspectives.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
400 Sprachwissenschaft, Linguistik

Schlagwörter

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690SCHNEIDER, Gerold, Mennatallah EL-ASSADY, Hans Martin LEHMANN, 2017. Tools and Methods for Processing and Visualizing Large Corpora. In: HILTUNEN, Turo, ed. and others. Big and Rich Data in English Corpus Linguistics : Methods and Explorations. Helsinki: VARIENG, University of Helsinki, 2017. Studies in Variation, Contacts and Change in English. 19
BibTex
@incollection{Schneider2017Tools-45081,
  year={2017},
  title={Tools and Methods for Processing and Visualizing Large Corpora},
  url={http://www.helsinki.fi/varieng/series/volumes/19/schneider_el-assady_lehmann/},
  number={19},
  publisher={VARIENG, University of Helsinki},
  address={Helsinki},
  series={Studies in Variation, Contacts and Change in English},
  booktitle={Big and Rich Data in English Corpus Linguistics : Methods and Explorations},
  editor={Hiltunen, Turo},
  author={Schneider, Gerold and El-Assady, Mennatallah and Lehmann, Hans Martin}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45081">
    <dc:creator>Schneider, Gerold</dc:creator>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/45081"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:contributor>Lehmann, Hans Martin</dc:contributor>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dcterms:title>Tools and Methods for Processing and Visualizing Large Corpora</dcterms:title>
    <dc:language>eng</dc:language>
    <dcterms:abstract xml:lang="eng">We present several approaches and methods which we develop or use to create workflows from data to evidence. They start with looking for specific items in large corpora, exploring overuse of particular items, and using off-the-shelf visualization such as GoogleViz. Second, we present the advanced visualization tools and pipelines which the Visualization Group at University of Konstanz is developing. After an overview, we apply statistical visualizations, Lexical Episode Plots and Interactive Hierarchical Modeling to the vast historical linguistics data offered by the Corpus of Historical American English (COHA), which ranges from 1800 to 2000. We investigate on the one hand the increase of noun compounds and visually illustrate correlations in the data over time. On the other hand we compute and visualize trends and topics in society from 1800 to 2000. We apply an incremental topic modeling algorithm to the extracted compound nouns to detect thematic changes throughout the investigated time period of 200 years. In this paper, we utilize various tailored analysis and visualization approaches to gain insight into the data from different perspectives.</dcterms:abstract>
    <dc:contributor>El-Assady, Mennatallah</dc:contributor>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2019-02-18T15:11:09Z</dc:date>
    <dc:creator>El-Assady, Mennatallah</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:creator>Lehmann, Hans Martin</dc:creator>
    <dc:contributor>Schneider, Gerold</dc:contributor>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2019-02-18T15:11:09Z</dcterms:available>
    <dcterms:issued>2017</dcterms:issued>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt

Prüfdatum der URL

2019-02-18

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Diese Publikation teilen