Page: 1... 25 26 27 28 29 30 31 32 33... 1.020
562 |
A Closer Look into the Robustness of Neural Dependency Parsers Using Better Adversarial Examples ...
|
|
|
|
BASE
|
|
Show details
|
|
563 |
Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
564 |
Multimodal or Text? Retrieval or BERT? Benchmarking Classifiers for the Shared Task on Hateful Memes ...
|
|
|
|
BASE
|
|
Show details
|
|
565 |
Evaluation of Summarization Systems across Gender, Age, and Race ...
|
|
|
|
BASE
|
|
Show details
|
|
566 |
Personalized Transformer for Explainable Recommendation ...
|
|
|
|
BASE
|
|
Show details
|
|
567 |
A Language Model-based Generative Classifier for Sentence-level Discourse Parsing ...
|
|
|
|
BASE
|
|
Show details
|
|
568 |
Neural-Symbolic Commonsense Reasoner with Relation Predictors ...
|
|
|
|
BASE
|
|
Show details
|
|
569 |
Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene ...
|
|
|
|
BASE
|
|
Show details
|
|
570 |
Controllable Neural Dialogue Summarization with Personal Named Entity Planning ...
|
|
|
|
BASE
|
|
Show details
|
|
571 |
Semantic Frame Induction using Masked Word Embeddings and Two-Step Clustering ...
|
|
|
|
BASE
|
|
Show details
|
|
572 |
Self-Attention Networks Can Process Bounded Hierarchical Languages ...
|
|
|
|
Abstract:
Read paper: https://www.aclanthology.org/2021.acl-long.292 Abstract: Despite their impressive performance in NLP, self-attention networks were recently proved to be limited for processing formal languages with hierarchical structure, such as Dyck-k, the language consisting of well-nested parentheses of k types. This suggested that natural language can be approximated well with models that are too weak for formal languages, or that the role of hierarchy and recursion in natural language might be limited. We qualify this implication by proving that self-attention networks can process Dyck-(k, D), the subset of Dyck-k with depth bounded by D, which arguably better captures the bounded hierarchical structure of natural language. Specifically, we construct a hard-attention network with D+1 layers and O(log k) memory size (per token per layer) that recognizes Dyck-(k, D), and a soft-attention network with two layers and O(log k) memory size that generates Dyck-(k, D). Experiments show that self-attention networks ...
|
|
Keyword:
Computational Linguistics; Condensed Matter Physics; Deep Learning; Electromagnetism; FOS Physical sciences; Information and Knowledge Engineering; Neural Network; Semantics
|
|
URL: https://dx.doi.org/10.48448/vkgx-xd51 https://underline.io/lecture/25646-self-attention-networks-can-process-bounded-hierarchical-languages
|
|
BASE
|
|
Hide details
|
|
573 |
DESCGEN: A Distantly Supervised Datasetfor Generating Entity Descriptions ...
|
|
|
|
BASE
|
|
Show details
|
|
574 |
KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers ...
|
|
|
|
BASE
|
|
Show details
|
|
575 |
"We will Reduce Taxes" - Identifying Election Pledges with Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
577 |
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization ...
|
|
|
|
BASE
|
|
Show details
|
|
578 |
Quantifying and Avoiding Unfair Qualification Labour in Crowdsourcing ...
|
|
|
|
BASE
|
|
Show details
|
|
579 |
Domain-Adaptive Pretraining Methods for Dialogue Understanding ...
|
|
|
|
BASE
|
|
Show details
|
|
580 |
Semi-Automatic Construction of Text-to-SQL Data for Domain Transfer ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1... 25 26 27 28 29 30 31 32 33... 1.020
|
|