1 |
One model for the learning of language.
|
|
|
|
In: Proceedings of the National Academy of Sciences of the United States of America, vol 119, iss 5 (2022)
|
|
Abstract:
A major goal of linguistics and cognitive science is to understand what class of learning systems can acquire natural language. Until recently, the computational requirements of language have been used to argue that learning is impossible without a highly constrained hypothesis space. Here, we describe a learning system that is maximally unconstrained, operating over the space of all computations, and is able to acquire many of the key structures present in natural language from positive evidence alone. We demonstrate this by providing the same learning model with data from 74 distinct formal languages which have been argued to capture key features of language, have been studied in experimental work, or come from an interesting complexity class. The model is able to successfully induce the latent system generating the observed strings from small amounts of evidence in almost all cases, including for regular (e.g., an , [Formula: see text], and [Formula: see text]), context-free (e.g., [Formula: see text], and [Formula: see text]), and context-sensitive (e.g., [Formula: see text], and xx) languages, as well as for many languages studied in learning experiments. These results show that relatively small amounts of positive evidence can support learning of rich classes of generative computations over structures. The model provides an idealized learning setup upon which additional cognitive constraints and biases can be formalized.
|
|
Keyword:
computational linguistics; formal language theory; Humans; Language; Learning; learning theory; Linguistics; program induction
|
|
URL: https://escholarship.org/uc/item/6sb6g4gx
|
|
BASE
|
|
Hide details
|
|
2 |
Hebrew Transformed: Machine Translation of Hebrew Using the Transformer Architecture
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Arc-Eager Construction Provides Learning Advantage Beyond Stack Management
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Controlled Multilingual Thesauri for Kazakh Industry-Specific Terms
|
|
|
|
In: Social Inclusion ; 9 ; 1 ; 35-44 ; Social Inclusion and Multilingualism: The Impact of Linguistic Justice, Economy of Language and Language Policy (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Assembling Syntax: Modeling Constituent Questions in a Grammar Engineering Framework
|
|
|
|
BASE
|
|
Show details
|
|
6 |
THE FUTURE TENSE PROPERTIES of UYGHUR and TURKISH
|
|
|
|
In: Zeitschrift für die Welt der Türken / Journal of World of Turks; Vol 12, No 2 (2020): [ZFWT] VOL. 12, NO. 2 (2020); 69-80 (2020)
|
|
BASE
|
|
Show details
|
|
7 |
Linguistic Phylogeny with Bayesian Markov Chain Monte Carlo: The Case of Indo-European
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Issues in Named Entity Recognition on Early Modern English Letters
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Detection of Longitudinal Development of Dementia in Literary Writing
|
|
|
|
In: http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1524651391474684 (2018)
|
|
BASE
|
|
Show details
|
|
10 |
Movement and structure effects on Universal 20 word order frequencies: A quantitative study
|
|
|
|
In: Glossa: a journal of general linguistics; Vol 3, No 1 (2018); 84 ; 2397-1835 (2018)
|
|
BASE
|
|
Show details
|
|
11 |
Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications
|
|
|
|
In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC) ; International Conference on Language Resources and Evaluation (LREC) ; 11 (2018)
|
|
BASE
|
|
Show details
|
|
12 |
Mining Social Science Publications for Survey Variables
|
|
|
|
In: Proceedings of the Second Workshop on NLP and Computational Social Science ; 47-52 (2018)
|
|
BASE
|
|
Show details
|
|
13 |
Resonances in Middle High German: New Methodologies in Prosody
|
|
|
|
In: Hench, Christopher Leo. (2017). Resonances in Middle High German: New Methodologies in Prosody. UC Berkeley: German. Retrieved from: http://www.escholarship.org/uc/item/13c6h2z2 (2017)
|
|
BASE
|
|
Show details
|
|
14 |
The Influence of Syntactic Frequencies on Human Sentence Processing
|
|
|
|
In: http://rave.ohiolink.edu/etdc/view?acc_num=osu1502452939626929 (2017)
|
|
BASE
|
|
Show details
|
|
15 |
Learning novel phonotactics from exposure to continuous speech
|
|
|
|
In: Laboratory Phonology: Journal of the Association for Laboratory Phonology; Vol 8, No 1 (2017); 12 ; 1868-6354 (2017)
|
|
BASE
|
|
Show details
|
|
16 |
Machine-readable text corpora and the linguistic description of languages
|
|
|
|
In: Text analysis and computers ; 1 ; ZUMA-Nachrichten Spezial ; 64-75 ; Text Analysis and Computers Conference (2017)
|
|
BASE
|
|
Show details
|
|
17 |
Code-switched English Pronunciation Modeling for Swahili Spoken Term Detection (Pub Version, Open Access)
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Sentiment Big Data Flow Analysis by Means of Dynamic Linguistic Patterns
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Making the Most of It: Word Sense Annotation and Disambiguation in the Face of Data Sparsity and Ambiguity
|
|
|
|
In: Jurgens, David Alan. (2014). Making the Most of It: Word Sense Annotation and Disambiguation in the Face of Data Sparsity and Ambiguity. UCLA: Computer Science 0201. Retrieved from: http://www.escholarship.org/uc/item/2wn4h7ph (2014)
|
|
BASE
|
|
Show details
|
|
|
|