DE eng

Search in the Catalogues and Directories

Hits 1 – 5 of 5

1
Cross-stitched Multi-modal Encoders ...
Abstract: In this paper, we propose a novel architecture for multi-modal speech and text input. We combine pretrained speech and text encoders using multi-headed cross-modal attention and jointly fine-tune on the target problem. The resultant architecture can be used for continuous token-level classification or utterance-level prediction acting on simultaneous text and speech. The resultant encoder efficiently captures both acoustic-prosodic and lexical information. We compare the benefits of multi-headed attention-based fusion for multi-modal utterance-level classification against a simple concatenation of pre-pooled, modality-specific representations. Our model architecture is compact, resource efficient, and can be trained on a single consumer GPU card. ...
Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
URL: https://dx.doi.org/10.48550/arxiv.2204.09227
https://arxiv.org/abs/2204.09227
BASE
Hide details
2
An Automated Quality Evaluation Framework of Psychotherapy Conversations with Local Quality Estimates ...
BASE
Show details
3
Using Prosodic and Lexical Information for Learning Utterance-level Behaviors in Psychotherapy
In: Interspeech (2018)
BASE
Show details
4
Can Distributed Word Embeddings be an alternative to costly linguistic features: A Study on Parsing Hindi ...
BASE
Show details
5
Methods for Leveraging Lexical Information in SMT ...
Singla, Karan. - : Unpublished, 2015
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
5
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern