Publikation: The Induction of Phonological Structure
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
URI (zitierfähiger Link)
Internationale Patentnummer
Link zur Lizenz
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Sammlungen
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
This dissertation explores to what extent phonological structure can be inferred from the distribution of sounds within words. For this purpose, a typologically oriented computational approach is pursued, which rests on techniques from the fields of computational linguistics, data mining and visual analytics. The methods that are presented are considered to be procedural universals which can be applied to any natural language in the same way even though they yield different results for individual languages.
The basic assumption that underlies all methods is that the co-occurrence of sounds in relevant contexts within words of a language is constrained. The restrictions of combinations of sounds lead to a given distribution, which in turn can be used to induce a distinction in the sounds of the language that can be related to natural classes and features in phonological theory. The focus of the present approach is not so much on the statistical methods that are necessary to induce the latent structures, but on the linguistically motivated contexts which manifest the existing constraints most clearly.
The induction of phonological structure from language data is an interesting research topic for various reasons. First of all, it is remarkable that phonological features, which are mostly defined in terms of articulatory or acoustic properties, are also reflected in the distribution of sounds in a language. In this thesis, I complement previous work on learning phonological categories (e.g., Ellison 1994; Goldsmith and Xanthos 2009) with an approach to infer place of articulation distinctions in consonants. The method is based on the principle of similar place avoidance (SPA; Pozdniakov and Segerer 2007), which states that consonants in CVC sequences tend to exhibit different place features. I contribute to earlier work in this research area by showing that this principle is not only active in Semitic languages (with a study of Maltese verbal roots) but also holds for West Germanic languages (with an investigation of the entries in the CELEX database for English, German and Dutch) and a worldwide sample of word forms from the ASJP dataset (Dryer test for universality), leading to the conclusion that it is a statistical universal. Using this principle to infer place distinctions in consonants yields almost perfect results for the ASJP data and the list of Maltese verbal roots. The automatically generated dendrograms closely correspond to the hierarchical structures for natural classes that have been postulated in the phonological literature (e.g., Rice 1994; McCarthy 1994).
In addition, the present thesis complements previous work on the machine learning of phonological structure with a novel method to automatically discriminate vowels and consonants in a language that is not based on N-gram statistics. The substitution approach relies on the frequency of sounds to occur as the discriminating segments in minimal pairs. Although the method does not achieve the same level of accuracy as earlier approaches in this area (e.g., Sukhotin 1962; Ellison 1994; Goldsmith and Xanthos 2009; Kim and Snyder 2013), it shows that a distinction of vowels and consonants can also be inferred from the relation of sounds in absentia.
Second, the induction of phonological structure is considered in the present work as a way to explore a large amount of language data in search for the presence of phonotactic constraints. To this end, I present a visual analytics approach for the detection of vowel harmony patterns that is intended as a proof of concept that a graphically enhanced statistical analysis can make potentially interesting patterns in the data more accessible to human perception. As the matrix visualizations show, languages exhibiting patterns of vowel harmony (or similar phenomena) can be distinguished from languages without such constraints at a glance. The visualization approach can easily be extended to other related phenomena, e.g., consonant harmony (Hansson 2010), synharmonism (Trubetzkoy 1939 [1967]) or any kind of (statistical) phonotactic constraints. The statistical measure on which the vowel harmony visualizations are based can also serve as a typological measure on the basis of which languages can be compared. The ranking of languages according to this measure approximately reflects the intuition about which languages show conspicuous harmony patterns.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
MAYER, Thomas, 2012. The Induction of Phonological Structure [Dissertation]. Konstanz: University of KonstanzBibTex
@phdthesis{Mayer2012Induc-26229, year={2012}, title={The Induction of Phonological Structure}, author={Mayer, Thomas}, address={Konstanz}, school={Universität Konstanz} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/26229"> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/> <dc:contributor>Mayer, Thomas</dc:contributor> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dcterms:title>The Induction of Phonological Structure</dcterms:title> <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/26229"/> <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/26229/1/Mayer_262292.pdf"/> <dcterms:issued>2012</dcterms:issued> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2014-02-05T06:58:21Z</dcterms:available> <dcterms:abstract xml:lang="eng">This dissertation explores to what extent phonological structure can be inferred from the distribution of sounds within words. For this purpose, a typologically oriented computational approach is pursued, which rests on techniques from the fields of computational linguistics, data mining and visual analytics. The methods that are presented are considered to be procedural universals which can be applied to any natural language in the same way even though they yield different results for individual languages.<br />The basic assumption that underlies all methods is that the co-occurrence of sounds in relevant contexts within words of a language is constrained. The restrictions of combinations of sounds lead to a given distribution, which in turn can be used to induce a distinction in the sounds of the language that can be related to natural classes and features in phonological theory. The focus of the present approach is not so much on the statistical methods that are necessary to induce the latent structures, but on the linguistically motivated contexts which manifest the existing constraints most clearly.<br /><br /><br />The induction of phonological structure from language data is an interesting research topic for various reasons. First of all, it is remarkable that phonological features, which are mostly defined in terms of articulatory or acoustic properties, are also reflected in the distribution of sounds in a language. In this thesis, I complement previous work on learning phonological categories (e.g., Ellison 1994; Goldsmith and Xanthos 2009) with an approach to infer place of articulation distinctions in consonants. The method is based on the principle of similar place avoidance (SPA; Pozdniakov and Segerer 2007), which states that consonants in CVC sequences tend to exhibit different place features. I contribute to earlier work in this research area by showing that this principle is not only active in Semitic languages (with a study of Maltese verbal roots) but also holds for West Germanic languages (with an investigation of the entries in the CELEX database for English, German and Dutch) and a worldwide sample of word forms from the ASJP dataset (Dryer test for universality), leading to the conclusion that it is a statistical universal. Using this principle to infer place distinctions in consonants yields almost perfect results for the ASJP data and the list of Maltese verbal roots. The automatically generated dendrograms closely correspond to the hierarchical structures for natural classes that have been postulated in the phonological literature (e.g., Rice 1994; McCarthy 1994).<br /><br /><br />In addition, the present thesis complements previous work on the machine learning of phonological structure with a novel method to automatically discriminate vowels and consonants in a language that is not based on N-gram statistics. The substitution approach relies on the frequency of sounds to occur as the discriminating segments in minimal pairs. Although the method does not achieve the same level of accuracy as earlier approaches in this area (e.g., Sukhotin 1962; Ellison 1994; Goldsmith and Xanthos 2009; Kim and Snyder 2013), it shows that a distinction of vowels and consonants can also be inferred from the relation of sounds in absentia.<br /><br /><br />Second, the induction of phonological structure is considered in the present work as a way to explore a large amount of language data in search for the presence of phonotactic constraints. To this end, I present a visual analytics approach for the detection of vowel harmony patterns that is intended as a proof of concept that a graphically enhanced statistical analysis can make potentially interesting patterns in the data more accessible to human perception. As the matrix visualizations show, languages exhibiting patterns of vowel harmony (or similar phenomena) can be distinguished from languages without such constraints at a glance. The visualization approach can easily be extended to other related phenomena, e.g., consonant harmony (Hansson 2010), synharmonism (Trubetzkoy 1939 [1967]) or any kind of (statistical) phonotactic constraints. The statistical measure on which the vowel harmony visualizations are based can also serve as a typological measure on the basis of which languages can be compared. The ranking of languages according to this measure approximately reflects the intuition about which languages show conspicuous harmony patterns.</dcterms:abstract> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/26229/1/Mayer_262292.pdf"/> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/> <dc:language>eng</dc:language> <dc:rights>terms-of-use</dc:rights> <dc:creator>Mayer, Thomas</dc:creator> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2014-02-05T06:58:21Z</dc:date> </rdf:Description> </rdf:RDF>