1 |
Mining an English-Chinese parallel Dataset of Financial News
|
|
|
|
In: Journal of Open Humanities Data; Vol 8 (2022); 9 ; 2059-481X (2022)
|
|
Abstract:
Parallel text datasets are a valuable for educational purposes, machine translation, and cross-language information retrieval, but few are domain-oriented. We have created a Chinese–English parallel dataset in the domain of finance technology, using the Financial Times website, from which we grabbed 60,473 news items from between 2007 and 2021. This dataset is a bilingual Chinese–English parallel dataset of news in the domain of finance. It is open access in its original state without transformation, and has been made not for machine translation as has been used, but for intelligent mining, in which we conducted many experiments using up-to-date text mining techniques: clustering (topic modeling, community detection, k-means), topic prediction (naive Bayes, SVM, LSTM, Bert), and pattern discovery (dictionary based, time series). We present the usage of these techniques as a framework for other studies, not only as an application but with an interpretation.
|
|
Keyword:
classification; clustering; computer science; English-Chinese; patterns; text mining
|
|
URL: https://openhumanitiesdata.metajnl.com/jms/article/view/62 https://doi.org/10.5334/johd.62
|
|
BASE
|
|
Hide details
|
|
4 |
The rumour spectrum
|
|
|
|
In: ISSN: 1932-6203 ; EISSN: 1932-6203 ; PLoS ONE ; https://hal.archives-ouvertes.fr/hal-01691934 ; PLoS ONE, Public Library of Science, 2018, 13 (1), pp.e0189080.1-27. ⟨10.1371/journal.pone.0189080⟩ (2018)
|
|
BASE
|
|
Show details
|
|
5 |
A semi-supervised Learning Approach to find equivalent long-string Organization Names
|
|
|
|
In: Colloque- Forum PEPS EXIA ; https://hal-enpc.archives-ouvertes.fr/hal-02310298 ; Colloque- Forum PEPS EXIA, Oct 2016, Champs sur Marne, France. 2016 (2016)
|
|
BASE
|
|
Show details
|
|
6 |
Clustering and Relational Ambiguity: from Text Data to Natural Data
|
|
|
|
In: EISSN: 2416-5999 ; Journal of Data Mining and Digital Humanities ; https://hal.archives-ouvertes.fr/hal-00920423 ; Journal of Data Mining and Digital Humanities, Episciences.org, 2013, 1 (1), pp.1 (2013)
|
|
BASE
|
|
Show details
|
|
8 |
Modeling Noun-Phrases Dynamics in Specialized Text Collections
|
|
|
|
In: ISSN: 0929-6174 ; Journal of Quantitative Linguistics ; https://hal.archives-ouvertes.fr/hal-02054488 ; Journal of Quantitative Linguistics, Taylor & Francis (Routledge), 2010, 17 (3), pp.212-228. ⟨10.1080/09296174.2010.485447⟩ (2010)
|
|
BASE
|
|
Show details
|
|
9 |
Bayesian Discriminant Analysis for Lexical Semantic Tagging
|
|
|
|
In: European Meeting on Cybernetics and Systems Research (EMCSR) ; https://hal.archives-ouvertes.fr/hal-03373905 ; European Meeting on Cybernetics and Systems Research (EMCSR), Apr 2002, Vienne, Austria (2002)
|
|
BASE
|
|
Show details
|
|
|
|