1 |
Code-switched inspired losses for generic spoken dialog representations
|
|
|
|
In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing ; 2021 Conference on Empirical Methods in Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-03574595 ; 2021 Conference on Empirical Methods in Natural Language Processing, Nov 2021, Punta Cana, Dominican Republic (2021)
|
|
Abstract:
International audience ; Spoken dialog systems need to be able to handle both multiple languages and multilinguality inside a conversation (e.g in case of codeswitching). In this work, we introduce new pretraining losses tailored to learn multilingual spoken dialog representations. The goal of these losses is to expose the model to codeswitched language. To scale up training, we automatically build a pretraining corpus composed of multilingual conversations in five different languages (French, Italian, English, German and Spanish) from OpenSubtitles, a huge multilingual corpus composed of 24.3G tokens. We test the generic representations on MIAM, a new benchmark composed of five dialog act corpora on the same aforementioned languages as well as on two novel multilingual downstream tasks (i.e multilingual mask utterance retrieval and multilingual inconsistency identification). Our experiments show that our new code switched-inspired losses achieve a better performance in both monolingual and multilingual settings.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
|
|
URL: https://hal.archives-ouvertes.fr/hal-03574595/document https://hal.archives-ouvertes.fr/hal-03574595/file/2108.12465.pdf https://hal.archives-ouvertes.fr/hal-03574595
|
|
BASE
|
|
Hide details
|
|
2 |
Improving Multimodal fusion via Mutual Dependency Maximisation
|
|
|
|
In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-03574609 ; Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov 2021, Online and Punta Cana, France. pp.231-245, ⟨10.18653/v1/2021.emnlp-main.21⟩ (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Code-switched inspired losses for generic spoken dialog representations ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Improving Multimodal fusion via Mutual Dependency Maximisation ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Improving Multimodal fusion via Mutual Dependency Maximisation ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Compositional Languages Emerge in a Neural Iterated Learning Model
|
|
|
|
In: 8th International Conference on Learning Representations ; https://hal.archives-ouvertes.fr/hal-02914840 ; 8th International Conference on Learning Representations, Apr 2020, Addis Ababa, Ethiopia (2020)
|
|
BASE
|
|
Show details
|
|
7 |
The importance of fillers for text representations of speech transcripts
|
|
|
|
In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) ; https://hal.archives-ouvertes.fr/hal-03134854 ; Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Nov 2020, Online, Dominican Republic. pp.7985-7993, ⟨10.18653/v1/2020.emnlp-main.641⟩ (2020)
|
|
BASE
|
|
Show details
|
|
8 |
Hierarchical Pre-training for Sequence Labelling in Spoken Dialog
|
|
|
|
In: Findings of the Association for Computational Linguistics: EMNLP 2020 ; https://hal.archives-ouvertes.fr/hal-03134851 ; Findings of the Association for Computational Linguistics: EMNLP 2020, Nov 2020, Online, France. pp.2636-2648, ⟨10.18653/v1/2020.findings-emnlp.239⟩ (2020)
|
|
BASE
|
|
Show details
|
|
9 |
Learning with Noise-Contrastive Estimation: Easing training by learning to scale
|
|
|
|
In: 27th International Conference on Computational Linguistics (COLING 2018) ; https://hal.archives-ouvertes.fr/hal-02912385 ; 27th International Conference on Computational Linguistics (COLING 2018), Aug 2018, Santa Fe, NM, United States. pp.3090-3101 ; http://coling2018.org/ (2018)
|
|
BASE
|
|
Show details
|
|
10 |
Algorithmes à base d'échantillonage pour l'entraînement de modèles de langue neuronaux
|
|
|
|
In: Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN ; 25e conférence sur le Traitement Automatique des Langues Naturelles (TALN) ; https://hal.archives-ouvertes.fr/hal-02912471 ; 25e conférence sur le Traitement Automatique des Langues Naturelles (TALN), May 2018, Rennes, France (2018)
|
|
BASE
|
|
Show details
|
|
11 |
Adaptation au domaine pour l'analyse morpho-syntaxique
|
|
|
|
In: Actes de TALN 2017 ; Conférence sur le Traitement Automatique des Langues Naturelles ; https://hal.archives-ouvertes.fr/hal-01620900 ; Conférence sur le Traitement Automatique des Langues Naturelles, Jun 2017, Orléans, France (2017)
|
|
BASE
|
|
Show details
|
|
12 |
LIMSI@WMT'17
|
|
|
|
In: Proceedings of the Conference on Machine Translation (WMT), ; Conference on Machine Translation ; https://hal.archives-ouvertes.fr/hal-01619897 ; Conference on Machine Translation, Association for Computational Linguistics, Jan 2017, Copenhagen, Denmark. pp.257 - 264 (2017)
|
|
BASE
|
|
Show details
|
|
13 |
An experimental analysis of Noise-Contrastive Estimation: the noise distribution matters
|
|
|
|
In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers ; 15th Conference of the European Chapter of the Association for Computational Linguistics: ; https://hal.archives-ouvertes.fr/hal-02912384 ; 15th Conference of the European Chapter of the Association for Computational Linguistics:, Apr 2017, Valencia, Spain. pp.15 - 20 (2017)
|
|
BASE
|
|
Show details
|
|
14 |
Character and Subword-Based Word Representation for Neural Language Modeling Prediction
|
|
|
|
In: Proceedings of the First Workshop on Subword and Character Level Models in NLP ; https://hal.archives-ouvertes.fr/hal-02912377 ; Proceedings of the First Workshop on Subword and Character Level Models in NLP, Sep 2017, Copenhagen, Denmark. pp.1-13, ⟨10.18653/v1/W17-4101⟩ (2017)
|
|
BASE
|
|
Show details
|
|
16 |
Non-lexical neural architecture for fine-grained POS Tagging
|
|
|
|
In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-02912379 ; Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Sep 2015, Lisbon, Portugal. pp.232-237, ⟨10.18653/v1/D15-1025⟩ (2015)
|
|
BASE
|
|
Show details
|
|
17 |
LIMSI$@$WMT'15 : Translation Task
|
|
|
|
In: Proceedings of the Tenth Workshop on Statistical Machine Translation ; https://hal.archives-ouvertes.fr/hal-02912383 ; Proceedings of the Tenth Workshop on Statistical Machine Translation, Sep 2015, Lisbon, Portugal. pp.145-151, ⟨10.18653/v1/W15-3016⟩ (2015)
|
|
BASE
|
|
Show details
|
|
|
|