1 |
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
|
|
|
|
In: ISSN: 2375-4699 ; EISSN: 2375-4702 ; ACM Transactions on Asian and Low-Resource Language Information Processing ; https://hal.inria.fr/hal-03616853 ; ACM Transactions on Asian and Low-Resource Language Information Processing, ACM, In press, ⟨10.1145/3523179⟩ (2022)
|
|
Abstract:
International audience ; Automatic spoken language identification (LID) is a very important research field in the era of multilingual voice-command-based human-computer interaction (HCI). A front-end LID module helps to improve the performance of many speech-based applications in the multilingual scenario. India is a populous country with diverse cultures and languages. The majority of the Indian population needs to use their respective native languages for verbal interaction with machines. Therefore, the development of efficient Indian spoken language recognition systems is useful for adapting smart technologies in every section of Indian society. The field of Indian LID has started gaining momentum in the last two decades, mainly due to the development of several standard multilingual speech corpora for the Indian languages. Even though significant research progress has already been made in this field, to the best of our knowledge, there are not many attempts to analytically review them collectively. In this work, we have conducted one of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field. In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts. Several essential aspects of the Indian LID research, such as the detailed description of the available speech corpora, the major research contributions, including the earlier attempts based on statistical modeling to the recent approaches based on different neural network architectures, and the future research trends are discussed. This review work will help assess the state of the present Indian LID research by any active researcher or any research enthusiasts from related fields.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]; [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing; [SCCO.LING]Cognitive science/Linguistics; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; [STAT.ML]Statistics [stat]/Machine Learning [stat.ML]; acoustic phonetics; code-switching; corpora development; discriminative model; Indian language identification; Language resources; language similarity; Machine learning; Signal processing systems Low-resourced languages
|
|
URL: https://hal.inria.fr/hal-03616853/file/TALLIP_Overview.pdf https://doi.org/10.1145/3523179 https://hal.inria.fr/hal-03616853 https://hal.inria.fr/hal-03616853/document
|
|
BASE
|
|
Hide details
|
|
2 |
Is Old French tougher to parse?
|
|
|
|
In: 20th International Workshop on Treebanks and Linguistic Theories ; https://hal.archives-ouvertes.fr/hal-03506500 ; 20th International Workshop on Treebanks and Linguistic Theories, Mar 2022, Sofia, Bulgaria (2022)
|
|
BASE
|
|
Show details
|
|
3 |
A Novel Multimodal Approach for Studying the Dynamics of Curiosity in Small Group Learning
|
|
|
|
In: https://hal.inria.fr/hal-03536340 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
4 |
A stochastic model of voice generation and the corresponding solution for the inverse problem using Artificial Neural Network for case with pathology in the vocal folds
|
|
|
|
In: ISSN: 1746-8094 ; Biomedical Signal Processing and Control ; https://hal-upec-upem.archives-ouvertes.fr/hal-03193501 ; Biomedical Signal Processing and Control, Elsevier, 2021, 68, pp.102623 (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Population modeling with machine learning can enhance measures of mental health
|
|
|
|
In: ISSN: 2047-217X ; GigaScience ; https://hal.inria.fr/hal-03470466 ; GigaScience, BioMed Central, 2021, ⟨10.1101/2020.08.25.266536⟩ (2021)
|
|
BASE
|
|
Show details
|
|
6 |
Navigation In Urban Environments Amongst Pedestrians Using Multi-Objective Deep Reinforcement Learning
|
|
|
|
In: ITSC 2021 - 24th IEEE International Conference on Intelligent Transportation Systems ; https://hal.inria.fr/hal-03372856 ; ITSC 2021 - 24th IEEE International Conference on Intelligent Transportation Systems, Sep 2021, Indianapolis, United States. pp.1-7 (2021)
|
|
BASE
|
|
Show details
|
|
7 |
Does infant-directed speech help phonetic learning? A machine learning investigation
|
|
|
|
In: ISSN: 0364-0213 ; EISSN: 1551-6709 ; Cognitive Science ; https://hal.archives-ouvertes.fr/hal-03080098 ; Cognitive Science, Wiley, 2021, 45 (5), ⟨10.1111/cogs.12946⟩ (2021)
|
|
BASE
|
|
Show details
|
|
8 |
D-Cliques: Compensating for Data Heterogeneity with Topology in Decentralized Federated Learning
|
|
|
|
In: https://hal.inria.fr/hal-03498160 ; 2021 (2021)
|
|
BASE
|
|
Show details
|
|
9 |
Multimodal Coarticulation Modeling : Towards the animation of an intelligible talking head ; Modélisation de la coarticulation multimodale : vers l'animation d'une tête parlante intelligible
|
|
|
|
In: https://hal.univ-lorraine.fr/tel-03203815 ; Intelligence artificielle [cs.AI]. Université de Lorraine, 2021. Français. ⟨NNT : 2021LORR0019⟩ (2021)
|
|
BASE
|
|
Show details
|
|
10 |
Learning emotions latent representation with CVAE for Text-Driven Expressive AudioVisual Speech Synthesis
|
|
|
|
In: ISSN: 0893-6080 ; Neural Networks ; https://hal.inria.fr/hal-03204193 ; Neural Networks, Elsevier, 2021, 141, pp.315-329. ⟨10.1016/j.neunet.2021.04.021⟩ (2021)
|
|
BASE
|
|
Show details
|
|
11 |
TREMoLo-Tweets: a Multi-Label Corpus of French Tweets for Language Register Characterization
|
|
|
|
In: RANLP 2021 - Recent Advances in Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-03331738 ; RANLP 2021 - Recent Advances in Natural Language Processing, Sep 2021, Varna, Bulgaria (2021)
|
|
BASE
|
|
Show details
|
|
12 |
End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings
|
|
|
|
In: ISBN: 978-1-6654-0019-0 ; 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII) ; https://hal.archives-ouvertes.fr/hal-03405970 ; 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII), Sep 2021, Nara, Japan ; https://www.acii-conf.net/2021/ (2021)
|
|
BASE
|
|
Show details
|
|
13 |
Comparison of Deep Learning Approaches for Protective Behaviour Detection Under Class Imbalance from MoCap and EMG data
|
|
|
|
In: ACIIW 2021 - 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos ; https://hal.archives-ouvertes.fr/hal-03523502 ; ACIIW 2021 - 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, Sep 2021, Nara, Japan. pp.01-08, ⟨10.1109/ACIIW52867.2021.9666417⟩ ; http://www.casapaganini.it/entimement/workshops/2021/Workshop2021_Home.php (2021)
|
|
BASE
|
|
Show details
|
|
14 |
Audio-driven speech animation using recurrent neutral network
|
|
|
|
In: https://hal.inria.fr/hal-03167213 ; United States, Patent n° : WO2021023861. 2021 (2021)
|
|
BASE
|
|
Show details
|
|
15 |
Building An Automated Gesture Imitation Game For Teenagers with ASD
|
|
|
|
In: ISSN: 0973-7006 ; Far East Journal of Electronics and Communications ; https://hal-imt-atlantique.archives-ouvertes.fr/hal-02894314 ; Far East Journal of Electronics and Communications, 2020, 23 (1), pp.1 - 10. ⟨10.17654/EC023010001⟩ (2020)
|
|
BASE
|
|
Show details
|
|
16 |
Att-HACK: An Expressive Speech Database with Social Attitudes
|
|
|
|
In: Speech Prosody ; https://hal.archives-ouvertes.fr/hal-02508362 ; Speech Prosody, May 2020, Tokyo, Japan (2020)
|
|
BASE
|
|
Show details
|
|
17 |
On the evaluation of retrofitting for supervised short-text classification
|
|
|
|
In: Proceedings of the Joint Ontology Workshops, CEUR-WS Vol 2708 : http://ceur-ws.org/Vol-2708/donlp2.pdf ; 1st International Workshop DeepOntoNLP: Deep Learning meets Ontologies and Natural Language Processing ; https://hal.mines-ales.fr/hal-02986853 ; 1st International Workshop DeepOntoNLP: Deep Learning meets Ontologies and Natural Language Processing, Sep 2020, Virtual & Bozen-Bolzano, Italy ; https://www.dl-onto-nlp-fois2020.ml/ (2020)
|
|
BASE
|
|
Show details
|
|
18 |
Décrire les textes politiques par le deep learning : à la recherche de nouveaux observables
|
|
|
|
In: JADT 2020 : 15es Journées internationales d’Analyse statistique des Données Textuelles ; https://hal.archives-ouvertes.fr/hal-03167188 ; JADT 2020 : 15es Journées internationales d’Analyse statistique des Données Textuelles, Jun 2020, Toulouse, France ; http://lexicometrica.univ-paris3.fr/jadt/JADT2020/jadt2020_pdf/GUARESI_JADT2020.pdf (2020)
|
|
BASE
|
|
Show details
|
|
19 |
Building Collaboration-based Resources In Endowed African Languages: Case Of NTeALan Dictionaries Platform
|
|
|
|
In: Proceedings of the First workshop on Resources for African Indigenous Languages (RAIL) ; https://hal.archives-ouvertes.fr/hal-02701162 ; Proceedings of the First workshop on Resources for African Indigenous Languages (RAIL), 2020 (2020)
|
|
BASE
|
|
Show details
|
|
20 |
NTeALan Dictionaries Platforms: An Example Of Collaboration-Based Model
|
|
|
|
In: Proceedings of the 1st International Workshop on Language Technology Platforms (IWLTP 2020) ; https://hal.archives-ouvertes.fr/hal-02701912 ; Proceedings of the 1st International Workshop on Language Technology Platforms (IWLTP 2020), 2020, pp.11 - 16 (2020)
|
|
BASE
|
|
Show details
|
|
|
|