1 |
Training dynamics of neural language models ...
|
|
|
|
Abstract:
Why do artificial neural networks model language so well? We claim that in order to answer this question and understand the biases that lead to such high performing language models---and all models that handle language---we must analyze the training process. For decades, linguists have used the tools of developmental linguistics to study human bias towards linguistic structure. Similarly, we wish to consider a neural network's training dynamics, i.e., the analysis of training in practice and the study of why our optimization methods work when applied. This framing shows us how structural patterns and linguistic properties are gradually built up, revealing more about why LSTM models learn so effectively on language data. To explore these questions, we might be tempted to appropriate methods from developmental linguistics, but we do not wish to make cognitive claims, so we avoid analogizing between human and artificial language learners. We instead use mathematical tools designed for investigating language ...
|
|
Keyword:
interpretability; NLP; training dynamics
|
|
URL: https://dx.doi.org/10.7488/era/1421 https://era.ed.ac.uk/handle/1842/38154
|
|
BASE
|
|
Hide details
|
|
2 |
A Non-Linear Structural Probe
|
|
|
|
In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Pareto Probing: Trading Off Accuracy for Complexity
|
|
|
|
In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020)
|
|
BASE
|
|
Show details
|
|
7 |
Understanding Learning Dynamics Of Language Models with SVCCA ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
A framework for (under)specifying dependency syntax without overloading annotators ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|