Butt, Miriam

Lade...
Profilbild
E-Mail-Adresse
Geburtsdatum
Forschungsvorhaben
Organisationseinheiten
Berufsbeschreibung
Nachname
Butt
Vorname
Miriam
Name

Suchergebnisse Publikationen

Gerade angezeigt 1 - 10 von 122
Lade...
Vorschaubild
Veröffentlichung

LFG and historical linguistics

2023, Booth, Hannah, Butt, Miriam

This chapter looks at the opportunities and perspectives that LFG offers for the study of language change, surveying existing LFG approaches within historical linguistics and providing examples of sample phenomena. We discuss how reanalysis, a major driver of language change, can be accounted for elegantly within LFG’s parallel architecture thanks to its crucial separation of form from function and, moreover, how different types of reanalysis can be understood, whether they involve rebracketing, recategorization, or changes at the lexical level commonly discussed in terms of grammaticalization. As we also discuss, LFG’s fundamental design principles and resulting flexibility of c-structure allow for complex, nuanced accounts of word order change. Furthermore, we survey the opportunities that LFG offers for exploring the complex relationship between variation and change, and in particular frequency effects and gradual change which proceeds via competition. Finally, we signpost future possibilities for work in this relatively underexplored but promising area.

Lade...
Vorschaubild
Veröffentlichung

Automatized Detection and Annotation for Calls to Action in Latin-American Social Media Postings

2022, Siskou, Wassiliki, Giralt Mirón, Clara, Molina Raith, Sara, Butt, Miriam

Voter mobilization via social media has shown to be an effective tool. While previous research has primarily looked at how calls-to-action (CTAs) were used in Twitter messages from non-profit organizations and protest mobilization, we are interested in identifying the linguistic cues used in CTAs found on Facebook and Twitter for an automatic identification of CTAs. The work is part of an on-going collaboration with researchers from political science, who are investigating CTAs in the period leading up to recent elections in three different Latin American countries. We developed a new NLP pipeline for Spanish to facilitate their work. Our pipeline annotates social media posts with a range of linguistic information and then conducts targeted searches for linguistic cues that allow for an automatic annotation and identification of relevant CTAs. By using carefully crafted and linguistically informed heuristics, our system so far achieves an F1-score of 0.72.

Lade...
Vorschaubild
Veröffentlichung

Is that really a question? : Going beyond factoid questions in NLP

2021, Kalouli, Aikaterini-Lida, Kehlbeck, Rebecca, Sevastjanova, Rita, Deussen, Oliver, Keim, Daniel A., Butt, Miriam

Research in NLP has mainly focused on factoid questions, with the goal of finding quick and reliable ways of matching a query to an answer. However, human discourse involves more than that: it contains non-canonical questions deployed to achieve specific communicative goals. In this paper, we investigate this under-studied aspect of NLP by introducing a targeted task, creating an appropriate corpus for the task and providing baseline models of diverse nature. With this, we are also able to generate useful insights on the task and open the way for future research in this direction.

Lade...
Vorschaubild
Veröffentlichung

Automatic Amharic Part of Speech Tagging (AAPOST) : A Comparative Approach Using Bidirectional LSTM and Conditional Random Fields (CRF) Methods

2020, Birhanie, Worku Kelemework, Butt, Miriam

Part of speech (POS) tagging is an initial task for many natural language applications. POS tagging for Amharic is in its infancy. This study contributes towards the improvement of Amharic POS tagging by experimenting using Deep Learning and Conditional Random Fields (CRF) approaches. Word embedding is integrated into the system to enhance performance. The model was applied to an Amharic news corpus tagged into 11 major part of speeches and achieved accuracies of 91.12% and 90% for the Bidirectional LSTM and CRF methods respectively. The result shows that the Bidirectional LSTM approach performance is better than the CRF method. More enhancement is expected in the future by increasing the size and diversity of Amharic corpus.

Lade...
Vorschaubild
Veröffentlichung

Uncertainty visualization : Fundamentals and recent developments

2022-08-31, Hägele, David, Schulz, Christoph, Beschle, Cedric, Booth, Hannah, Butt, Miriam, Barth, Andrea, Deussen, Oliver, Weiskopf, Daniel

This paper provides a brief overview of uncertainty visualization along with some fundamental considerations on uncertainty propagation and modeling. Starting from the visualization pipeline, we discuss how the different stages along this pipeline can be affected by uncertainty and how they can deal with this and propagate uncertainty information to subsequent processing steps. We illustrate recent advances in the field with a number of examples from a wide range of applications: uncertainty visualization of hierarchical data, multivariate time series, stochastic partial differential equations, and data from linguistic annotation.

Lade...
Vorschaubild
Veröffentlichung

VisInReport : Complementing Visual Discourse Analytics through Personalized Insight Reports

2021-08-11, Sevastjanova, Rita, El-Assady, Mennatallah, Bradley, Adam James, Collins, Christopher, Butt, Miriam, Keim, Daniel A.

We present VisInReport, a visual analytics tool that supports the manual analysis of discourse transcripts and generates reports based on user interaction. As an integral part of scholarly work in the social sciences and humanities, discourse analysis involves an aggregation of characteristics identified in the text, which, in turn, involves a prior identification of regions of particular interest. Manual data evaluation requires extensive effort, which can be a barrier to effective analysis. Our system addresses this challenge by augmenting the users' analysis with a set of automatically generated visualization layers. These layers enable the detection and exploration of relevant parts of the discussion supporting several tasks, such as topic modeling or question categorization. The system summarizes the extracted events visually and verbally, generating a content-rich insight into the data and the analysis process. During each analysis session, VisInReport builds a shareable report containing a curated selection of interactions and annotations generated by the analyst. We evaluate our approach on real-world datasets through a qualitative study with domain experts from political science, computer science, and linguistics. The results highlight the benefit of integrating the analysis and reporting processes through a visual analytics system, which supports the communication of results among collaborating researchers.

Vorschaubild nicht verfügbar
Veröffentlichung

Agreement in Urdu adjectival adverbials

2021, Butt, Miriam, Holloway King, Tracy

Vorschaubild nicht verfügbar
Veröffentlichung

Case Markers in Indo-Aryan

2022, Butt, Miriam

Indo-Aryan languages have the longest documented historical record, with the earliest attested texts going back to 1900 bce. Old Indo-Aryan (Vedic, Sanskrit) had an inflectional case-marking system where nominatives functioned as subjects. Objects could be realized via several different case markers (depending on semantic and structural factors), but not the nominative. This inflectional system was lost over the course of several centuries during Middle Indo-Aryan, resulting in just a nominative–oblique inflectional distinction. The New Indo-Aryan languages innovated case markers and developed new case-marking systems. Like in Old Indo-Aryan, case is systematically used to express semantic differences via differential object marking constructions. However, unlike in Old Indo-Aryan, many of the New Indo-Aryan languages are ergative and all allow for non-nominative subjects, most prominently for experiencer subjects. Objects, on the other hand, can now also be unmarked (nominative), usually participating in differential object marking. The case-marking patterns within New Indo-Aryan and across time have given rise to a number of debates and analyses. The most prominent of these include issues of case alignment and language change, the distribution of ergative vs. accusative vs. nominative case, and discussions of markedness and differential case marking.

Lade...
Vorschaubild
Veröffentlichung

ThamizhiMorph : A morphological parser for the Tamil language

2021-04-23, Sarveswaran, Kengatharaiyer, Dias, Gihan, Butt, Miriam

This paper presents an open source and extendable Morphological Analyser cum Generator (MAG) for Tamil named ThamizhiMorph. Tamil is a low-resource language in terms of NLP processing tools and applications. In addition, most of the available tools are neither open nor extendable. A morphological analyser is a key resource for the storage and retrieval of morphophonological and morphosyntactic information, especially for morphologically rich languages, and is also useful for developing applications within Machine Translation. This paper describes how ThamizhiMorph is designed using a Finite-State Transducer (FST) and implemented using Foma. We discuss our design decisions based on the peculiarities of Tamil and its nominal and verbal paradigms. We specify a high-level meta-language to efficiently characterise the language’s inflectional morphology. We evaluate ThamizhiMorph using text from a Tamil textbook and the Tamil Universal Dependency treebank version 2.5. The evaluation and error analysis attest a very high performance level, with the identified errors being mostly due to out-of-vocabulary items, which are easily fixable. In order to foster further development, we have made our scripts, the FST models, lexicons, Meta-Morphological rules, lists of generated verbs and nouns, and test data sets freely available for others to use and extend upon.

Lade...
Vorschaubild
Veröffentlichung

Prosody of Case Markers in Urdu

2021, Mumtaz, Benazir, Canzi, Massimiliano, Butt, Miriam

This paper studies the prosody of case clitics in Urdu, for which various different claims exist in the literature. We conducted a production experiment and controlled for effects potentially arising from the phonetics of the case clitics, the syntactic function they express and clausal position. We find that case clitics are incorporated into the prosodic phrase of the noun and that they become part of the overall LH contour found on accentual phrases in Urdu/Hindi. We also find some differences across case type and position which we tie to information structural effects.