1 |
One model for the learning of language.
|
|
|
|
In: Proceedings of the National Academy of Sciences of the United States of America, vol 119, iss 5 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Hebrew Transformed: Machine Translation of Hebrew Using the Transformer Architecture
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Arc-Eager Construction Provides Learning Advantage Beyond Stack Management
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Controlled Multilingual Thesauri for Kazakh Industry-Specific Terms
|
|
|
|
In: Social Inclusion ; 9 ; 1 ; 35-44 ; Social Inclusion and Multilingualism: The Impact of Linguistic Justice, Economy of Language and Language Policy (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Assembling Syntax: Modeling Constituent Questions in a Grammar Engineering Framework
|
|
|
|
Abstract:
Thesis (Ph.D.)--University of Washington, 2021 ; This dissertation is dedicated to a cross-linguistic account of constituent (aka wh-) questions as part of a grammar engineering toolkit, the Grammar Matrix, couched in the Head-driven Phrase Structure Grammar formalism (HPSG). The main \textbf{research question} is: What, in formal grammar terms, constitutes an analysis of the various attested ways to form constituent questions which is demonstrably compatible with analyses of other phenomena that also vary typologically ? I assume here a working definition of \emph{analysis} as a set of HPSG types, including lexical and phrasal, and ways in which these general types vary depending on a given language. By ``varying typologically'' I mean that as the analyses presented here were driven by a review of typological literature on constituent questions, the interacting analyses that are part of the Grammar Matrix were driven by typology of other phenomena. My research question is related to a big question in linguistics: What is the range of possible variation of human languages? Specifically, this work aims to contribute to this big question by providing a set of analyses which are (i) driven by typological surveys; (ii) demonstrably integrated with existing analyses; and (iii) rigorously tested. Thus, while not a claim about possibilities and impossibilities, this work is a step towards establishing a range of specific linguistic analyses which are consistently successful across languages. I test the analyses in terms of the coverage, the overgeneration, and the ambiguity with respect to test suites which include constituent questions along with other syntactic phenomena and come from typologically and genealogically diverse languages. I look in particular detail into Russian for which I compile a test suite of 273 sentences including various types of simple and complex declarative and interrogative clauses. I additionally evaluate the system on five ``held-out'' languages, all from different language families which I did not consider at all during development. On the theoretical level, I conclude that the HPSG filler-gap construction in combination with non-local features such as SLASH and QUE provides a functional basis for cross-linguistic modeling of obligatory question phrase fronting in main clauses but it is not yet fully clear whether they are sufficient to model the contrast between clause-embedding predicates meaning e.g. "think" and "ask", cross-linguistically. I conclude also that question phrase fronting which seems optional on the surface is difficult to formally model as such, which suggests it could be more readily analyzed as a combination of obligatory fronting, with any material appearing in front of the question word licensed by a separate information structure fronting mechanism. I furthermore conclude that "lexical threading'', the HPSG mechanism by which lexical heads project their arguments' nonlocal features, complicates the analysis of fronting and that the entire Grammar Matrix system can be reasoned about more simply without the lexical threading assumption---although interrogative morphology can be modeled more straightforwardly with that assumption. On the grammar engineering level, I conclude that the existing Grammar Matrix system with its lexicon, morphotactics, polar questions, and case libraries can be successfully extended to support an analysis of constituent questions. The Grammar Matrix's information structure library however would require more substantial revisions in order to be integrated with an analysis of constituent questions, especially to support data from languages with flexible word order and data with embedded clauses, from all languages. At the level of the DELPH-IN HPSG \textbf{formalism}, I conclude that the recently suggested append list type can be conveniently used for modeling question phrase fronting instead of the cumbersome difference list append. Finally, on the methodological level, I conclude that using at least one larger test suite with more complex sentences during Grammar Matrix development (along with multiple smaller test suites for typological diversity) involves a cost for typological breadth and a danger of ``overfitting'' the cross-linguistic system to one language but it is still important to uncover issues in the analysis which would otherwise be ignored.
|
|
Keyword:
computational linguistics; grammar engineering; HPSG; Linguistics; syntax
|
|
URL: http://hdl.handle.net/1773/47087
|
|
BASE
|
|
Hide details
|
|
6 |
THE FUTURE TENSE PROPERTIES of UYGHUR and TURKISH
|
|
|
|
In: Zeitschrift für die Welt der Türken / Journal of World of Turks; Vol 12, No 2 (2020): [ZFWT] VOL. 12, NO. 2 (2020); 69-80 (2020)
|
|
BASE
|
|
Show details
|
|
7 |
Linguistic Phylogeny with Bayesian Markov Chain Monte Carlo: The Case of Indo-European
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Issues in Named Entity Recognition on Early Modern English Letters
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Detection of Longitudinal Development of Dementia in Literary Writing
|
|
|
|
In: http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1524651391474684 (2018)
|
|
BASE
|
|
Show details
|
|
10 |
Movement and structure effects on Universal 20 word order frequencies: A quantitative study
|
|
|
|
In: Glossa: a journal of general linguistics; Vol 3, No 1 (2018); 84 ; 2397-1835 (2018)
|
|
BASE
|
|
Show details
|
|
11 |
Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications
|
|
|
|
In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC) ; International Conference on Language Resources and Evaluation (LREC) ; 11 (2018)
|
|
BASE
|
|
Show details
|
|
12 |
Mining Social Science Publications for Survey Variables
|
|
|
|
In: Proceedings of the Second Workshop on NLP and Computational Social Science ; 47-52 (2018)
|
|
BASE
|
|
Show details
|
|
13 |
Resonances in Middle High German: New Methodologies in Prosody
|
|
|
|
In: Hench, Christopher Leo. (2017). Resonances in Middle High German: New Methodologies in Prosody. UC Berkeley: German. Retrieved from: http://www.escholarship.org/uc/item/13c6h2z2 (2017)
|
|
BASE
|
|
Show details
|
|
14 |
The Influence of Syntactic Frequencies on Human Sentence Processing
|
|
|
|
In: http://rave.ohiolink.edu/etdc/view?acc_num=osu1502452939626929 (2017)
|
|
BASE
|
|
Show details
|
|
15 |
Learning novel phonotactics from exposure to continuous speech
|
|
|
|
In: Laboratory Phonology: Journal of the Association for Laboratory Phonology; Vol 8, No 1 (2017); 12 ; 1868-6354 (2017)
|
|
BASE
|
|
Show details
|
|
16 |
Machine-readable text corpora and the linguistic description of languages
|
|
|
|
In: Text analysis and computers ; 1 ; ZUMA-Nachrichten Spezial ; 64-75 ; Text Analysis and Computers Conference (2017)
|
|
BASE
|
|
Show details
|
|
17 |
Code-switched English Pronunciation Modeling for Swahili Spoken Term Detection (Pub Version, Open Access)
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Sentiment Big Data Flow Analysis by Means of Dynamic Linguistic Patterns
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Making the Most of It: Word Sense Annotation and Disambiguation in the Face of Data Sparsity and Ambiguity
|
|
|
|
In: Jurgens, David Alan. (2014). Making the Most of It: Word Sense Annotation and Disambiguation in the Face of Data Sparsity and Ambiguity. UCLA: Computer Science 0201. Retrieved from: http://www.escholarship.org/uc/item/2wn4h7ph (2014)
|
|
BASE
|
|
Show details
|
|
|
|