1 |
Between History and Natural Language Processing: Study, Enrichment and Online Publication of French Parliamentary Debates of the Early Third Republic (1881-1899)
|
|
|
|
In: ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora ; https://hal.archives-ouvertes.fr/hal-03623351 ; ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora, Jun 2022, Marseille, France ; https://www.clarin.eu/ParlaCLARIN-III (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision
|
|
|
|
In: Information; Volume 13; Issue 4; Pages: 175 (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts
|
|
|
|
In: Sensors; Volume 22; Issue 3; Pages: 852 (2022)
|
|
BASE
|
|
Show details
|
|
4 |
Analysis of the Effects of Lockdown on Staff and Students at Universities in Spain and Colombia Using Natural Language Processing Techniques
|
|
|
|
In: International Journal of Environmental Research and Public Health; Volume 19; Issue 9; Pages: 5705 (2022)
|
|
BASE
|
|
Show details
|
|
5 |
An Enhanced Neural Word Embedding Model for Transfer Learning
|
|
|
|
In: Applied Sciences; Volume 12; Issue 6; Pages: 2848 (2022)
|
|
Abstract:
Due to the expansion of data generation, more and more natural language processing (NLP) tasks are needing to be solved. For this, word representation plays a vital role. Computation-based word embedding in various high languages is very useful. However, until now, low-resource languages such as Bangla have had very limited resources available in terms of models, toolkits, and datasets. Considering this fact, in this paper, an enhanced BanglaFastText word embedding model is developed using Python and two large pre-trained Bangla models of FastText (Skip-gram and cbow). These pre-trained models were trained on a collected large Bangla corpus (around 20 million points of text data, in which every paragraph of text is considered as a data point). BanglaFastText outperformed Facebook’s FastText by a significant margin. To evaluate and analyze the performance of these pre-trained models, the proposed work accomplished text classification based on three popular textual Bangla datasets, and developed models using various machine learning classical approaches, as well as a deep neural network. The evaluations showed a superior performance over existing word embedding techniques and the Facebook Bangla FastText pre-trained model for Bangla NLP. In addition, the performance in the original work concerning these textual datasets provides excellent results. A Python toolkit is proposed, which is convenient for accessing the models and using the models for word embedding, obtaining semantic relationships word-by-word or sentence-by-sentence; sentence embedding for classical machine learning approaches; and also the unsupervised finetuning of any Bangla linguistic dataset.
|
|
Keyword:
Bangla NLP; BanglaLM; text classification; toolkit; web crawler; word embedding
|
|
URL: https://doi.org/10.3390/app12062848
|
|
BASE
|
|
Hide details
|
|
6 |
Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman Urdu Text Shared in Social Media
|
|
|
|
In: Applied Sciences; Volume 12; Issue 5; Pages: 2694 (2022)
|
|
BASE
|
|
Show details
|
|
7 |
Predicting Academic Performance: Analysis of Students’ Mental Health Condition from Social Media Interactions
|
|
|
|
In: Behavioral Sciences; Volume 12; Issue 4; Pages: 87 (2022)
|
|
BASE
|
|
Show details
|
|
8 |
Vec2Dynamics: A Temporal Word Embedding Approach to Exploring the Dynamics of Scientific Keywords—Machine Learning as a Case Study
|
|
|
|
In: Big Data and Cognitive Computing; Volume 6; Issue 1; Pages: 21 (2022)
|
|
BASE
|
|
Show details
|
|
9 |
Methods, Models and Tools for Improving the Quality of Textual Annotations
|
|
|
|
In: Modelling; Volume 3; Issue 2; Pages: 224-242 (2022)
|
|
BASE
|
|
Show details
|
|
10 |
Creating multi-scripts sentiment analysis lexicons for Algerian, Moroccan and Tunisian dialects
|
|
|
|
In: 7th International Conference on Data Mining (DTMN 2021) Computer Science Conference Proceedings in Computer Science & Information Technology (CS & IT) ; https://hal.archives-ouvertes.fr/hal-03308111 ; 7th International Conference on Data Mining (DTMN 2021) Computer Science Conference Proceedings in Computer Science & Information Technology (CS & IT), Sep 2021, Copenhagen, Denmark (2021)
|
|
BASE
|
|
Show details
|
|
11 |
Bilingual English-German word embedding models for scientific text ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Bilingual English-German word embedding models for scientific text ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
以《Cofacts 真的假的》資料庫為基礎建立中文科學假訊息之探勘模型 ; Text Mining Model for Detecting Chinese Fake Scientific Messages based on Cofacts Open Data
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Automatic Part-of-Speech Tagging for Security Vulnerability Descriptions ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Automatic Part-of-Speech Tagging for Security Vulnerability Descriptions ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Text ranking based on semantic meaning of sentences ; Textrankning baserad på semantisk betydelse hos meningar
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese
|
|
|
|
In: Applied Sciences ; Volume 11 ; Issue 22 (2021)
|
|
BASE
|
|
Show details
|
|
20 |
Acoustic Word Embeddings for End-to-End Speech Synthesis
|
|
|
|
In: Applied Sciences ; Volume 11 ; Issue 19 (2021)
|
|
BASE
|
|
Show details
|
|
|
|