1 |
Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Detection of the Prodromal Phase of Bipolar Disorder from Psychological and Phonological Aspects in Social Media ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Additional file 1: of New insights on serodiagnosis of trichinellosis during window period: early diagnostic antigens from Trichinella spiralis intestinal worms ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Additional file 1: of New insights on serodiagnosis of trichinellosis during window period: early diagnostic antigens from Trichinella spiralis intestinal worms ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Video Captioning with Guidance of Multimodal Latent Topics ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Understanding the Changing Roles of Scientific Publications via Citation Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Tree-Structured Neural Machine for Linguistics-Aware Sentence Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
A Scalable and Adaptive Method for Finding Semantically Equivalent Cue Words of Uncertainty ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Discover and Learn New Objects from Documentaries ...
|
|
|
|
Abstract:
Despite the remarkable progress in recent years, detecting objects in a new context remains a challenging task. Detectors learned from a public dataset can only work with a fixed list of categories, while training from scratch usually requires a large amount of training data with detailed annotations. This work aims to explore a novel approach -- learning object detectors from documentary films in a weakly supervised manner. This is inspired by the observation that documentaries often provide dedicated exposition of certain object categories, where visual presentations are aligned with subtitles. We believe that object detectors can be learned from such a rich source of information. Towards this goal, we develop a joint probabilistic framework, where individual pieces of information, including video frames and subtitles, are brought together via both visual and linguistic links. On top of this formulation, we further derive a weakly supervised learning algorithm, where object model learning and training set ... : Published on CVPR 2017 (spotlight) ...
|
|
Keyword:
Computer Vision and Pattern Recognition cs.CV; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/1707.09593 https://dx.doi.org/10.48550/arxiv.1707.09593
|
|
BASE
|
|
Hide details
|
|
12 |
A Semantic QA-Based Approach for Text Summarization Evaluation ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Adversarial Multi-Criteria Learning for Chinese Word Segmentation ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
A Sequential Matching Framework for Multi-turn Response Selection in Retrieval-based Chatbots ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Phonetic Temporal Neural Model for Language Identification ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
End-to-End Attention based Text-Dependent Speaker Verification ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|