1 |
One model for the learning of language.
|
|
|
|
In: Proceedings of the National Academy of Sciences of the United States of America, vol 119, iss 5 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Hebrew Transformed: Machine Translation of Hebrew Using the Transformer Architecture
|
|
|
|
Abstract:
This thesis presents the first known end-to-end application to Hebrew language of Google’s state-of-the-art Transformer architecture for natural language processing (NLP). The state of the art in machine translation (MT) of Hebrew remains poor. Scholarly work in MT, deep learning (DL), and other areas of NLP for Hebrew began to develop much later and remains much less mature than for other languages. The problem is difficult because of the nature of Hebrew as a morphologically-rich language (MRL), the small size of the total corpus of electronic Hebrew documents available as training material, and the small size of the Hebrew-literate computing community worldwide. Nonetheless, significant advances in Hebrew NLP tools, data, methods, and scholarly infrastructure over the last 15 years, combined with recent advances in general NLP and MT over the last few years, especially the rise of neural networks and deep learning, create an enticing opportunity to attempt to advance the current state of Hebrew MT. More specifically, Google’s Transformer neural network and associated technologies such as bidirectional encoder representations from Transformers (BERT) have revolutionized general MT and hold great promise for improving automatic Hebrew translation. This thesis demonstrates that, as measured by METEOR scores, a basic Hebrew Transformer trained in a few hours on a single GPU (graphics processing unit) exceeds the current performance of Google Translate on in-genre Hebrew translation tasks and is not far behind Google Translate on Hebrew translation tasks in general.
|
|
Keyword:
Artificial intelligence; bidirectional encoder representations from transformers (BERT); computational linguistics; Computer science; hebrew; Linguistics; machine translation; natural language processing (NLP); transformer
|
|
URL: https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37370749
|
|
BASE
|
|
Hide details
|
|
3 |
Arc-Eager Construction Provides Learning Advantage Beyond Stack Management
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Controlled Multilingual Thesauri for Kazakh Industry-Specific Terms
|
|
|
|
In: Social Inclusion ; 9 ; 1 ; 35-44 ; Social Inclusion and Multilingualism: The Impact of Linguistic Justice, Economy of Language and Language Policy (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Assembling Syntax: Modeling Constituent Questions in a Grammar Engineering Framework
|
|
|
|
BASE
|
|
Show details
|
|
6 |
THE FUTURE TENSE PROPERTIES of UYGHUR and TURKISH
|
|
|
|
In: Zeitschrift für die Welt der Türken / Journal of World of Turks; Vol 12, No 2 (2020): [ZFWT] VOL. 12, NO. 2 (2020); 69-80 (2020)
|
|
BASE
|
|
Show details
|
|
7 |
Linguistic Phylogeny with Bayesian Markov Chain Monte Carlo: The Case of Indo-European
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Issues in Named Entity Recognition on Early Modern English Letters
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Detection of Longitudinal Development of Dementia in Literary Writing
|
|
|
|
In: http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1524651391474684 (2018)
|
|
BASE
|
|
Show details
|
|
10 |
Movement and structure effects on Universal 20 word order frequencies: A quantitative study
|
|
|
|
In: Glossa: a journal of general linguistics; Vol 3, No 1 (2018); 84 ; 2397-1835 (2018)
|
|
BASE
|
|
Show details
|
|
11 |
Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications
|
|
|
|
In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC) ; International Conference on Language Resources and Evaluation (LREC) ; 11 (2018)
|
|
BASE
|
|
Show details
|
|
12 |
Mining Social Science Publications for Survey Variables
|
|
|
|
In: Proceedings of the Second Workshop on NLP and Computational Social Science ; 47-52 (2018)
|
|
BASE
|
|
Show details
|
|
13 |
Resonances in Middle High German: New Methodologies in Prosody
|
|
|
|
In: Hench, Christopher Leo. (2017). Resonances in Middle High German: New Methodologies in Prosody. UC Berkeley: German. Retrieved from: http://www.escholarship.org/uc/item/13c6h2z2 (2017)
|
|
BASE
|
|
Show details
|
|
14 |
The Influence of Syntactic Frequencies on Human Sentence Processing
|
|
|
|
In: http://rave.ohiolink.edu/etdc/view?acc_num=osu1502452939626929 (2017)
|
|
BASE
|
|
Show details
|
|
15 |
Learning novel phonotactics from exposure to continuous speech
|
|
|
|
In: Laboratory Phonology: Journal of the Association for Laboratory Phonology; Vol 8, No 1 (2017); 12 ; 1868-6354 (2017)
|
|
BASE
|
|
Show details
|
|
16 |
Machine-readable text corpora and the linguistic description of languages
|
|
|
|
In: Text analysis and computers ; 1 ; ZUMA-Nachrichten Spezial ; 64-75 ; Text Analysis and Computers Conference (2017)
|
|
BASE
|
|
Show details
|
|
17 |
Code-switched English Pronunciation Modeling for Swahili Spoken Term Detection (Pub Version, Open Access)
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Sentiment Big Data Flow Analysis by Means of Dynamic Linguistic Patterns
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Making the Most of It: Word Sense Annotation and Disambiguation in the Face of Data Sparsity and Ambiguity
|
|
|
|
In: Jurgens, David Alan. (2014). Making the Most of It: Word Sense Annotation and Disambiguation in the Face of Data Sparsity and Ambiguity. UCLA: Computer Science 0201. Retrieved from: http://www.escholarship.org/uc/item/2wn4h7ph (2014)
|
|
BASE
|
|
Show details
|
|
|
|