1 |
Between History and Natural Language Processing: Study, Enrichment and Online Publication of French Parliamentary Debates of the Early Third Republic (1881-1899)
|
|
|
|
In: ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora ; https://hal.archives-ouvertes.fr/hal-03623351 ; ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora, Jun 2022, Marseille, France ; https://www.clarin.eu/ParlaCLARIN-III (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision
|
|
|
|
In: Information; Volume 13; Issue 4; Pages: 175 (2022)
|
|
Abstract:
Bilingual lexicon extraction is useful, especially for low-resource languages that can leverage from high-resource languages. The Uyghur language is a derivative language, and its language resources are scarce and noisy. Moreover, it is difficult to find a bilingual resource to utilize the linguistic knowledge of other large resource languages, such as Chinese or English. There is little related research on unsupervised extraction for the Chinese-Uyghur languages, and the existing methods mainly focus on term extraction methods based on translated parallel corpora. Accordingly, unsupervised knowledge extraction methods are effective, especially for the low-resource languages. This paper proposes a method to extract a Chinese-Uyghur bilingual dictionary by combining the inter-word relationship matrix mapped by the neural network cross-language word embedding vector. A seed dictionary is used as a weak supervision signal. A small Chinese-Uyghur parallel data resource is used to map the multilingual word vectors into a unified vector space. As the word-particles of these two languages are not well-coordinated, stems are used as the main linguistic particles. The strong inter-word semantic relationship of word vectors is used to associate Chinese-Uyghur semantic information. Two retrieval indicators, such as nearest neighbor retrieval and cross-domain similarity local scaling, are used to calculate similarity to extract bilingual dictionaries. The experimental results show that the accuracy of the Chinese-Uyghur bilingual dictionary extraction method proposed in this paper is improved to 65.06%. This method helps to improve Chinese-Uyghur machine translation, automatic knowledge extraction, and multilingual translations.
|
|
Keyword:
bilingual dictionary; cross-language word embedding; seed dictionary
|
|
URL: https://doi.org/10.3390/info13040175
|
|
BASE
|
|
Hide details
|
|
3 |
Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts
|
|
|
|
In: Sensors; Volume 22; Issue 3; Pages: 852 (2022)
|
|
BASE
|
|
Show details
|
|
4 |
Analysis of the Effects of Lockdown on Staff and Students at Universities in Spain and Colombia Using Natural Language Processing Techniques
|
|
|
|
In: International Journal of Environmental Research and Public Health; Volume 19; Issue 9; Pages: 5705 (2022)
|
|
BASE
|
|
Show details
|
|
5 |
An Enhanced Neural Word Embedding Model for Transfer Learning
|
|
|
|
In: Applied Sciences; Volume 12; Issue 6; Pages: 2848 (2022)
|
|
BASE
|
|
Show details
|
|
6 |
Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman Urdu Text Shared in Social Media
|
|
|
|
In: Applied Sciences; Volume 12; Issue 5; Pages: 2694 (2022)
|
|
BASE
|
|
Show details
|
|
7 |
Predicting Academic Performance: Analysis of Students’ Mental Health Condition from Social Media Interactions
|
|
|
|
In: Behavioral Sciences; Volume 12; Issue 4; Pages: 87 (2022)
|
|
BASE
|
|
Show details
|
|
8 |
Vec2Dynamics: A Temporal Word Embedding Approach to Exploring the Dynamics of Scientific Keywords—Machine Learning as a Case Study
|
|
|
|
In: Big Data and Cognitive Computing; Volume 6; Issue 1; Pages: 21 (2022)
|
|
BASE
|
|
Show details
|
|
9 |
Methods, Models and Tools for Improving the Quality of Textual Annotations
|
|
|
|
In: Modelling; Volume 3; Issue 2; Pages: 224-242 (2022)
|
|
BASE
|
|
Show details
|
|
10 |
Creating multi-scripts sentiment analysis lexicons for Algerian, Moroccan and Tunisian dialects
|
|
|
|
In: 7th International Conference on Data Mining (DTMN 2021) Computer Science Conference Proceedings in Computer Science & Information Technology (CS & IT) ; https://hal.archives-ouvertes.fr/hal-03308111 ; 7th International Conference on Data Mining (DTMN 2021) Computer Science Conference Proceedings in Computer Science & Information Technology (CS & IT), Sep 2021, Copenhagen, Denmark (2021)
|
|
BASE
|
|
Show details
|
|
11 |
Bilingual English-German word embedding models for scientific text ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Bilingual English-German word embedding models for scientific text ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
以《Cofacts 真的假的》資料庫為基礎建立中文科學假訊息之探勘模型 ; Text Mining Model for Detecting Chinese Fake Scientific Messages based on Cofacts Open Data
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Automatic Part-of-Speech Tagging for Security Vulnerability Descriptions ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Automatic Part-of-Speech Tagging for Security Vulnerability Descriptions ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Text ranking based on semantic meaning of sentences ; Textrankning baserad på semantisk betydelse hos meningar
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese
|
|
|
|
In: Applied Sciences ; Volume 11 ; Issue 22 (2021)
|
|
BASE
|
|
Show details
|
|
20 |
Acoustic Word Embeddings for End-to-End Speech Synthesis
|
|
|
|
In: Applied Sciences ; Volume 11 ; Issue 19 (2021)
|
|
BASE
|
|
Show details
|
|
|
|