Page: 1 2 3 4 5 6 7 8 9... 150
81 |
EnvEdit: Environment Editing for Vision-and-Language Navigation ...
|
|
|
|
BASE
|
|
Show details
|
|
82 |
Homepage2Vec: Language-Agnostic Website Embedding and Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
83 |
Multilinguals at SemEval-2022 Task 11: Transformer Based Architecture for Complex NER ...
|
|
|
|
BASE
|
|
Show details
|
|
84 |
A new approach to calculating BERTScore for automatic assessment of translation quality ...
|
|
|
|
BASE
|
|
Show details
|
|
85 |
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
86 |
ViWOZ: A Multi-Domain Task-Oriented Dialogue Systems Dataset For Low-resource Language ...
|
|
|
|
BASE
|
|
Show details
|
|
87 |
EAG: Extract and Generate Multi-way Aligned Corpus for Complete Multi-lingual Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
89 |
Learning Bidirectional Translation between Descriptions and Actions with Small Paired Data ...
|
|
|
|
BASE
|
|
Show details
|
|
90 |
A Feasibility Study of Answer-Agnostic Question Generation for Education ...
|
|
|
|
BASE
|
|
Show details
|
|
91 |
Language Generation for Broad-Coverage, Explainable Cognitive Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
95 |
Regional Negative Bias in Word Embeddings Predicts Racial Animus--but only via Name Frequency ...
|
|
|
|
BASE
|
|
Show details
|
|
98 |
Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics ...
|
|
|
|
BASE
|
|
Show details
|
|
99 |
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity ...
|
|
|
|
BASE
|
|
Show details
|
|
100 |
Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding ...
|
|
|
|
Abstract:
In the age of large transformer language models, linguistic evaluation play an important role in diagnosing models' abilities and limitations on natural language understanding. However, current evaluation methods show some significant shortcomings. In particular, they do not provide insight into how well a language model captures distinct linguistic skills essential for language understanding and reasoning. Thus they fail to effectively map out the aspects of language understanding that remain challenging to existing models, which makes it hard to discover potential limitations in models and datasets. In this paper, we introduce Curriculum as a new format of NLI benchmark for evaluation of broad-coverage linguistic phenomena. Curriculum contains a collection of datasets that covers 36 types of major linguistic phenomena and an evaluation procedure for diagnosing how well a language model captures reasoning skills for distinct types of linguistic phenomena. We show that this linguistic-phenomena-driven ... : Accepted by NAACL 2022 (Main Conference) ...
|
|
Keyword:
Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2204.06283 https://arxiv.org/abs/2204.06283
|
|
BASE
|
|
Hide details
|
|
Page: 1 2 3 4 5 6 7 8 9... 150
|
|