Modeling Nominal Predications in Hindi/Urdu


@phdthesis{Sulger2015Model-38382, title={Modeling Nominal Predications in Hindi/Urdu}, year={2015}, author={Sulger, Sebastian}, address={Konstanz}, school={Universität Konstanz} }

Sulger, Sebastian 2017-04-06T09:39:14Z eng Sulger, Sebastian Modeling Nominal Predications in Hindi/Urdu 2015 The identification and classification of nominal predicators and their arguments is a notorious problem in natural language processing (NLP). Semantic reasoning, information retrieval, question answering and other applications can benefit greatly from a successful treatment of nominal predication (Meyers et al., 2004b). Overall, a considerable amount of work in NLP focuses on the identification and annotation of verbal predication and arguments thereof, and there is less research on types and identification of nominal predicates and their arguments. This thesis is a contribution of the latter type. It focuses on the description and analysis of nominal arguments in the South Asian language Hindi/Urdu. The analysis is couched within the theory of Lexical-Functional Grammar (LFG, Bresnan, 2001, Dalrymple, 2001) and is implemented in a computational grammar, the Urdu ParGram grammar (Butt and King, 2007), which is a part of the ParGram (“Parallel Grammar”) project on parallel LFG grammar engineering (Butt et al., 1999a, 2002, 1999b). The implementation makes use of the grammar development platform XLE (Crouch et al., 2015).<br />Different types of case-marked nominal arguments in Hindi/Urdu are examined: genitive, locative as well as instrumental arguments. Among these, the genitive is a special case marker in that it features morphological agreement with the head noun (Butt and King, 2004b). Each of these argument types is discussed in detail regarding the case marking strategies employed, their general linear order within the noun phrase, the selection by distinct types of head nominals, their overall functional behavior as well as binding properties.<br />All types of nominal arguments exhibit scrambling and can occur outside of the noun phrases they are licensed in; this is attributed to the tight correlation between the case marking and the thematic role realized by the argument as well as, in the case of the genitive, the morphosyntactic agreement between the case marker and the head noun. In addition, all types of nominal arguments may regularly undergo a process of argument suppression, which results in the argument not being realized, but existentially bound (Barker, 1995); an analysis of these cases in terms of pronominal drop is considered, but rejected. The thesis also reviews possessive and locative clauses, which are analyzed as intransitives and copula clauses, respectively. These novel analyses are shown to better account for the functional behavior, binding patterns, as well as overall structural paradigms of these clauses compared to previous analyses.<br />The thesis also includes a discussion of noun-verb complex predicates in Hindi/Urdu. Here, nouns (referred to as “nominal hosts”) and verbs (also called “light verbs”) form a single predicate with a single set of grammatical functions, whereas the argument structure is complex, as the nominal host may itself contribute arguments to the overall predication (Mohanan, 1994). While the construction in and of itself is theoretically comparatively well-understood, the combinatory possibilities between the nominal host and the light verb are not; some hosts occur with several light verbs, others occur with subsets, while still others occur with a single light verb only. Two corpus studies are discussed that aim at 1) uncovering the constraints on combining hosts with light verbs and 2) creating a lexical resource that can serve as input to NLP applications. One such application is the Urdu ParGram grammar, where it is shown how the results from the corpus studies can be translated into templates that model the combinatory patterns in terms of statistical (dis)preferences.<br />From the point of view of grammar development, the thesis argues that a unified account of genitive arguments, as currently employed in ParGram, cannot be maintained. Instead, the thesis proposes to use a more detailed approach that can successfully account for the observed patterns. This connects to the issue of parallelism in ParGram. Conventions developed within the ParGram grammars are extensive and dictate the form and possible values of the features used in the grammars as well as the type of analysis chosen for a particular construction. Grammar writers are in principle only allowed to abandon parallelism if maintaining it would be at the cost of misrepresenting the linguistic facts (Butt and King, 2007, Butt et al., 1999b, King et al., 2005). Being faithful to the facts of nominal predication in Hindi/Urdu entails abandoning the ParGram analysis for possessives, which does not distinguish between different types of possessives (Dipper, 2003).<br />On the other hand, the implementation profits from the detailed linguistic analysis that is applied within ParGram and uses an array of notational instruments in XLE that model the generalizations in an accurate manner. The complete grammar (in its most recent version, as of the time of submission of the thesis) is included on the CD-ROM attached to this document; the implementation can also be tested using the online INESS platform for treebanking and LFG grammar testing (Rosén et al., 2012a,b) (the INESS homepage is located at: 2017-04-06T09:39:14Z

