Kalouli, Aikaterini-Lida

Lade...
Profilbild
E-Mail-Adresse
ORCID
Geburtsdatum
Forschungsvorhaben
Organisationseinheiten
Berufsbeschreibung
Nachname
Kalouli
Vorname
Aikaterini-Lida
Name

Suchergebnisse Publikationen

Gerade angezeigt 1 - 8 von 8
Vorschaubild nicht verfügbar
Veröffentlichung

Explaining Simple Natural Language Inference

2019, Kalouli, Aikaterini-Lida, Buis, Annebeth, Real, Livy, Palmer, Martha, de Paiva, Valeria

The vast amount of research introducing new corpora and techniques for semi-automatically annotating corpora shows the important role that datasets play in today’s research, especially in the machine learning community. This rapid development raises concerns about the quality of the datasets created and consequently of the models trained, as recently discussed with respect to the Natural Language Inference (NLI) task. In this work we conduct an annotation experiment based on a small subset of the SICK corpus. The experiment reveals several problems in the annotation guidelines, and various challenges of the NLI task itself. Our quantitative evaluation of the experiment allows us to assign our empirical observations to specific linguistic phenomena and leads us to recommendations for future annotation tasks, for NLI and possibly for other tasks.

Vorschaubild nicht verfügbar
Veröffentlichung

Composing noun phrase vector representations

2019, Kalouli, Aikaterini-Lida, de Paiva, Valeria, Crouch, Richard

Vector representations of words have seen an increasing success over the past years in a variety of NLP tasks. While there seems to be a consensus about the usefulness of word embeddings and how to learn them, it is still unclear which representations can capture the meaning of phrases or even whole sentences. Recent work has shown that simple operations outperform more complex deep architectures. In this work, we propose two novel constraints for computing noun phrase vector representations. First, we propose that the semantic and not the syntactic contribution of each component of a noun phrase should be considered, so that the resulting composed vectors express more of the phrase meaning. Second, the composition process of the two phrase vectors should apply suitable dimensions’ selection in a way that specific semantic features captured by the phrase’s meaning become more salient. Our proposed methods are compared to 11 other approaches, including popular baselines and a neural net architecture, and are evaluated across 6 tasks and 2 datasets. Our results show that these constraints lead to more expressive phrase representations and can be applied to other state-of-the-art methods to improve their performance.

Lade...
Vorschaubild
Veröffentlichung

Mixed-Initiative Active Learning for Generating Linguistic Insights in Question Classification

2018, Sevastjanova, Rita, El-Assady, Mennatallah, Hautli-Janisz, Annette, Kalouli, Aikaterini-Lida, Kehlbeck, Rebecca, Deussen, Oliver, Keim, Daniel A., Butt, Miriam

We propose a mixed-initiative active learning system to tackle the challenge of building descriptive models for under-studied linguistic phenomena. Our particular use case is the linguistic analysis of question types, in particular in understanding what characterizes information-seeking vs. non-information-seeking questions (i.e., whether the speaker wants to elicit an answer from the hearer or not) and how automated methods can assist with the linguistic analysis. Our approach is motivated by the need for an effective and efficient human-in-the-loop process in natural language processing that relies on example-based learning and provides immediate feedback to the user. In addition to the concrete implementation of a question classification system, we describe general paradigms of explainable mixed-initiative learning, allowing for the user to access the patterns identified automatically by the system, rather than being confronted by a machine learning black box. Our user study demonstrates the capability of our system in providing deep linguistic insight into this particular analysis problem. The results of our evaluation are competitive with the current state-of-the-art.

Lade...
Vorschaubild
Veröffentlichung

ParHistVis : Visualization of Parallel Multilingual Historical Data

2019, Kalouli, Aikaterini-Lida, Kehlbeck, Rebecca, Sevastjanova, Rita, Kaiser, Katharina, Kaiser, Georg A., Butt, Miriam

The study of language change through parallel corpora can be advantageous for the analysis of complex interactions between time, text domain and language. Often, those advantages cannot be fully exploited due to the sparse but high-dimensional nature of such historical data. To tackle this challenge, we introduce ParHistVis: a novel, free, easy-to-use, interactive visualization tool for parallel, multilingual, diachronic and synchronic linguistic data. We illustrate the suitability of the components of the tool based on a use case of word order change in Romance wh-interrogatives.

Lade...
Vorschaubild
Veröffentlichung

A multingual approach to question classification

2018, Kalouli, Aikaterini-Lida, Kaiser, Katharina, Hautli-Janisz, Annette, Kaiser, Georg A., Butt, Miriam

In this paper we present the Konstanz Resource of Questions (KRoQ), the first dependency-parsed, parallel multilingual corpus of information-seeking and non information-seeking questions. In creating the corpus, we employ a linguistically motivated rule-based system that uses linguistic cues from one language to help classify and annotate questions across other languages. Our current corpus includes German, French, Spanish and Koine Greek. Based on the linguistically motivated heuristics we identify, a two-step scoring mechanism assigns intra- and inter-language scores to each question. Based on these scores, each question is classified as being either information seeking or non-information seeking. An evaluation shows that this mechanism correctly classifies questions in 79% of the cases. We release our corpus as a basis for further work in the area of question classification. It can be utilized as training and testing data for machine-learning algorithms, as corpus-data for theoretical linguistic questions or as a resource for further rule-based approaches to question identification.

Lade...
Vorschaubild
Veröffentlichung

GKR : the Graphical Knowledge Representation for semantic parsing

2018, Kalouli, Aikaterini-Lida, Crouch, Richard

This paper describes the first version of an open-source semantic parser that creates graphical representations of sentences to be used for further semantic processing, e.g. for natural language inference, reasoning and semantic similarity. The Graphical Knowledge Representation which is output by the parser is inspired by the Abstract Knowledge Representation, which separates out conceptual and contextual levels of representation that deal respectively with the subject matter of a sentence and its existential commitments. Our representation is a layered graph with each subgraph holding different kinds of information, including one sub-graph for concepts and one for contexts. Our first evaluation of the system shows an F-score of 85% in accurately representing sentences as semantic graphs.

Vorschaubild nicht verfügbar
Veröffentlichung

GKR : Bridging the gap between symbolic/structural and distributional meaning representations

2019, Kalouli, Aikaterini-Lida, Crouch, Richard, de Paiva, Valeria

Three broad approaches have been attempted to combine distributional and structural/symbolic aspects to construct meaning representations: a) injecting linguistic features into distributional representations, b) injecting distributional features into symbolic representations or c) combining structural and distributional features in the final representation. This work focuses on an example of the third and less studied approach: it extends the Graphical Knowledge Representation (GKR) to include distributional features and proposes a division of semantic labour between the distributional and structural/symbolic features. We propose two extensions of GKR that clearly show this division and empirically test one of the proposals on an NLI dataset with hard compositional pairs.

Vorschaubild nicht verfügbar
Veröffentlichung

CoUSBi : A Structured and Visualized Legal Corpus of US State Bills

2018, Kalouli, Aikaterini-Lida, Vrana, Leo, Fabella, Vigile Marie, Bellani, Luna, Hautli-Janisz, Annette

This paper reports on an approach to automatically transform semi-structured and public databases of US state-level legislative bills into a structured, legal corpus, namely the Corpus of US Bills (CoUSBi). Our work has resulted in a methodology and a corpus that makes this data usable for natural language processing applications. It thus also lays important groundwork for work in the social sciences, particularly in the fields of political science and economics where there is a growing interest in the relationship between legislative policy-making and economic behavior. Against the backdrop of eventually contributing to a Legal Knowledge Graph, the paper shows that the corpus we provide already fulfills the requirements to be connected to other resources: We automatically extract correspondences between individual state bills and model bills from independent organizations, generating interesting insights into the legislative process. We furthermore use NEREx, a Visual Analytics framework, that allows us to capture important content of the bills at a glance.