LMFingerprints : Visual Explanations of Language Model Embedding Spaces through Layerwise Contextualization Scores
2022-07-29, Sevastjanova, Rita, Kalouli, Aikaterini-Lida, Schätzle, Christin, Schäfer, Hanna, El-Assady, Mennatallah
Language models, such as BERT, construct multiple, contextualized embeddings for each word occurrence in a corpus. Understanding how the contextualization propagates through the model's layers is crucial for deciding which layers to use for a specific analysis task. Currently, most embedding spaces are explained by probing classifiers; however, some findings remain inconclusive. In this paper, we present LMFingerprints, a novel scoring-based technique for the explanation of contextualized word embeddings. We introduce two categories of scoring functions, which measure (1) the degree of contextualization, i.e., the layerwise changes in the embedding vectors, and (2) the type of contextualization, i.e., the captured context information. We integrate these scores into an interactive explanation workspace. By combining visual and verbal elements, we provide an overview of contextualization in six popular transformer-based language models. We evaluate hypotheses from the domain of computational linguistics, and our results not only confirm findings from related work but also reveal new aspects about the information captured in the embedding spaces. For instance, we show that while numbers are poorly contextualized, stopwords have an unexpected high contextualization in the models' upper layers, where their neighborhoods shift from similar functionality tokens to tokens that contribute to the meaning of the surrounding sentences.
Hy-NLI : a Hybrid system for state-of-the-art Natural Language Inference
2021, Kalouli, Aikaterini-Lida
A main characteristic of human language and understanding is our ability to reason about things, i.e., to infer conclusions from given facts. Within the field of Natural Language Processing and Natural Language Understanding, the task of inferring such conclusions has come to be known as Natural Language Inference (NLI) and is currently a popular field of research. NLI is most often formulated as the task of determining where a sentence entails (i.e., implies) or contradicts (i.e., implies the opposite) or is neutral (i.e., does not have any relation) with respect to another sentence (MacCartney and Manning, 2007). Although such a task sounds trivial for humans, it is less so for machines: the processes behind human inference require even more than understanding linguistic input; they presuppose our understanding about the world and everyday life and require the complex combination and interaction of this information.
In this thesis, I implement a hybrid NLI system, Hy-NLI, which is able to determine the inference relation between a pair of sentences. Hy-NLI consists of a symbolic and a deep-learning component, combining the best of both worlds: it exploits the strengths that each approach exhibits and mitigates their weaknesses. The implemented system relies on the finding that each of the two very different approaches is particularly suitable for a specific kind of phenomena. Deep-learning methods are good in dealing with graded and more fluid aspects of meaning, while symbolic approaches can efficiently deal with contextual phenomena of natural language, e.g., modals, negations, implicatives, etc. Hy-NLI learns to distinguish between the cases and respectively employ the component that is known to work best for each of them. Thus, this thesis contributes to the state-of-the-art in NLI. It also contributes to the general debate whether symbolic or deep-learning approaches are most efficient by showing that systems can benefit from both of them in different ways. Hence, the thesis at hand motivates research that does not choose one of the two, but marries them up into a successful combination.
To reach the ultimate goal of closing the gap between these two approaches, this thesis makes four major contributions. First, it sheds light on the available NLI datasets, their issues and the insights they can offer us about the NLI task in general. Precisely, I investigate one of the well-known mainstream NLI datasets, SICK (Marelli et al., 2014b), and observe how certain corpus construction practices have influenced the quality of the data itself and of its annotations. I also show how the quality of annotations is not only affected by the corpus construction process but also from inherent human disagreements and fine-grained nuances of human inference. The issues found in the datasets are addressed in a variety of ways, from manually correcting subsets of the corpus to performing experiments that quantify and identify these aspects of the NLI task. The second major contribution of the thesis at hand is the development of the Graphical Knowledge Representation (GKR, Kalouli and Crouch (2018)), a semantic representation suitable for semantic tasks such as NLI, and the implementation of an efficient GKR parser. The representation stands out from other similar representations for its separation of the sentence information in different layers/graphs. Particularly, there is a strict separation between the conceptual, predicate-argument structure and the contextual, boolean structure. This modularity and projection architecture gives rise to a concept-based, intensional Description Logic (Baader et al., 2003) semantics. The efficiency and suitability of GKR for NLI is revealed through the implementation of the symbolic inference engine GKR4NLI, a further major goal of this thesis. GKR4NLI is developed as an inference mechanism relying on Natural Logic (Van Benthem, 1986, Sánchez-Valencia, 1991) and on the semantics imposed by GKR. Its performance on different datasets confirms previous findings that symbolic engines are good in dealing with semantically complex phenomena, but struggle with more robust aspects of meaning. Thus, these results motivate the need for a hybrid system, where each aspect of meaning is treated by the most suitable component. This need is addressed with the implementation of Hy-NLI, the final major goal of this thesis. The hybrid system uses GKR4NLI as its symbolic component and the state-of-the-art language representation model BERT (Devlin et al., 2019) as its deep-learning component. Their successful combination leads Hy-NLI to outperform other state-of-the-art methods across datasets of different nature and complexity. With such performance across the board, Hy-NLI confirms the need for hybrid systems and paves the way for more work in this research direction.
XplaiNLI : Explainable Natural Language Inference through Visual Analytics
2020, Kalouli, Aikaterini-Lida, Sevastjanova, Rita, de Paiva, Valeria, Crouch, Richard, El-Assady, Mennatallah
Advances in Natural Language Inference (NLI) have helped us understand what state-of-the-art models really learn and what their generalization power is. Recent research has revealed some heuristics and biases of these models. However, to date, there is no systematic effort to capitalize on those insights through a system that uses these to explain the NLI decisions. To this end, we propose XplaiNLI, an eXplainable, interactive, visualization interface that computes NLI with different methods and provides explanations for the decisions made by the different approaches.
Composing noun phrase vector representations
2019, Kalouli, Aikaterini-Lida, de Paiva, Valeria, Crouch, Richard
Vector representations of words have seen an increasing success over the past years in a variety of NLP tasks. While there seems to be a consensus about the usefulness of word embeddings and how to learn them, it is still unclear which representations can capture the meaning of phrases or even whole sentences. Recent work has shown that simple operations outperform more complex deep architectures. In this work, we propose two novel constraints for computing noun phrase vector representations. First, we propose that the semantic and not the syntactic contribution of each component of a noun phrase should be considered, so that the resulting composed vectors express more of the phrase meaning. Second, the composition process of the two phrase vectors should apply suitable dimensions’ selection in a way that specific semantic features captured by the phrase’s meaning become more salient. Our proposed methods are compared to 11 other approaches, including popular baselines and a neural net architecture, and are evaluated across 6 tasks and 2 datasets. Our results show that these constraints lead to more expressive phrase representations and can be applied to other state-of-the-art methods to improve their performance.
Negation, Coordination, and Quantifiers in Contextualized Language Models
2022, Kalouli, Aikaterini-Lida, Sevastjanova, Rita, Schätzle, Christin, Romero, Maribel
With the success of contextualized language models, much research explores what these models really learn and in which cases they still fail. Most of this work focuses on specific NLP tasks and on the learning outcome. Little research has attempted to decouple the models' weaknesses from specific tasks and focus on the embeddings per se and their mode of learning. In this paper, we take up this research opportunity: based on theoretical linguistic insights, we explore whether the semantic constraints of function words are learned and how the surrounding context impacts their embeddings. We create suitable datasets, provide new insights into the inner workings of LMs vis-a-vis function words and implement an assisting visual web interface for qualitative analysis.
Explaining Contextualization in Language Models using Visual Analytics
2021, Sevastjanova, Rita, Kalouli, Aikaterini-Lida, Schätzle, Christin, Schäfer, Hanna, El-Assady, Mennatallah
Despite the success of contextualized language models on various NLP tasks, it is still unclear what these models really learn. In this paper, we contribute to the current efforts of explaining such models by exploring the continuum between function and content words with respect to contextualization in BERT, based on linguistically-informed insights. In particular, we utilize scoring and visual analytics techniques: we use an existing similarity-based score to measure contextualization and integrate it into a novel visual analytics technique, presenting the model’s layers simultaneously and highlighting intra-layer properties and inter-layer differences. We show that contextualization is neither driven by polysemy nor by pure context variation. We also provide insights on why BERT fails to model words in the middle of the functionality continuum.
GKR : Bridging the gap between symbolic/structural and distributional meaning representations
2019, Kalouli, Aikaterini-Lida, Crouch, Richard, de Paiva, Valeria
Three broad approaches have been attempted to combine distributional and structural/symbolic aspects to construct meaning representations: a) injecting linguistic features into distributional representations, b) injecting distributional features into symbolic representations or c) combining structural and distributional features in the final representation. This work focuses on an example of the third and less studied approach: it extends the Graphical Knowledge Representation (GKR) to include distributional features and proposes a division of semantic labour between the distributional and structural/symbolic features. We propose two extensions of GKR that clearly show this division and empirically test one of the proposals on an NLI dataset with hard compositional pairs.
KonTra at CMCL 2021 Shared Task : Predicting Eye Movements by Combining BERT with Surface, Linguistic and Behavioral Information
2021, Yu, Qi, Kalouli, Aikaterini-Lida, Frassinelli, Diego
This paper describes the submission of the team KonTra to the CMCL 2021 Shared Task on eye-tracking prediction. Our system combines the embeddings extracted from a fine-tuned BERT model with surface, linguistic and behavioral features, resulting in an average mean absolute error of 4.22 across all 5 eye-tracking measures. We show that word length and features representing the expectedness of a word are consistently the strongest predictors across all 5 eye-tracking measures.
Is that really a question? : Going beyond factoid questions in NLP
2021, Kalouli, Aikaterini-Lida, Kehlbeck, Rebecca, Sevastjanova, Rita, Deussen, Oliver, Keim, Daniel A., Butt, Miriam
Research in NLP has mainly focused on factoid questions, with the goal of finding quick and reliable ways of matching a query to an answer. However, human discourse involves more than that: it contains non-canonical questions deployed to achieve specific communicative goals. In this paper, we investigate this under-studied aspect of NLP by introducing a targeted task, creating an appropriate corpus for the task and providing baseline models of diverse nature. With this, we are also able to generate useful insights on the task and open the way for future research in this direction.
Explaining Simple Natural Language Inference
2019, Kalouli, Aikaterini-Lida, Buis, Annebeth, Real, Livy, Palmer, Martha, de Paiva, Valeria
The vast amount of research introducing new corpora and techniques for semi-automatically annotating corpora shows the important role that datasets play in today’s research, especially in the machine learning community. This rapid development raises concerns about the quality of the datasets created and consequently of the models trained, as recently discussed with respect to the Natural Language Inference (NLI) task. In this work we conduct an annotation experiment based on a small subset of the SICK corpus. The experiment reveals several problems in the annotation guidelines, and various challenges of the NLI task itself. Our quantitative evaluation of the experiment allows us to assign our empirical observations to specific linguistic phenomena and leads us to recommendations for future annotation tasks, for NLI and possibly for other tasks.