1 |
Creation of a multilingual aligned corpus with Ukrainian as the target language and its exploitation
|
|
|
|
In: Computational Linguistics and Intelligent Systems ; https://hal.archives-ouvertes.fr/hal-01736363 ; Computational Linguistics and Intelligent Systems, Apr 2017, Kharkiv, Ukraine (2017)
|
|
Abstract:
International audience ; The question on creation of linguistic resources (such as corpora, lexica or terminologies) occupies an important place in the research areas related to linguistics, Natural Language Processing, Computer Sciences, psycholinguistics, etc. In this paper, we propose the description of a multilingual corpus in which Ukrainian is the target language, while source languages are Polish, French and English. The corpus contains literary texts and a small subset built with texts provided by medical area. On the whole, the corpus is composed of 62 literary texts and 129 medical texts. The corpus counts over 1 million words in the target Ukrainian language, and at least as much in the source languages taken all together. This is a directional corpus aligned at the level of sentences. After the description of this corpus, we introduce some possible exploitations and first results. We then conclude and indicate some directions for future work. The corpus presented in this work is available for the research purposes: http://natalia.grabar.free.fr/resources.php
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO]Computer Science [cs]; Natural Language Processing; Parallel corpora; Ukrainian
|
|
URL: https://hal.archives-ouvertes.fr/hal-01736363
|
|
BASE
|
|
Hide details
|
|
2 |
Unsupervised acquisition of morphological resources for Ukrainian
|
|
|
|
In: Computational Linguistics and Intelligent Systems ; https://hal.archives-ouvertes.fr/hal-01736400 ; Computational Linguistics and Intelligent Systems, Apr 2017, Kharkiv, Ukraine (2017)
|
|
BASE
|
|
Show details
|
|
3 |
Understanding of unknown medical words
|
|
|
|
In: Biomedical NLP Workshop associated with RANLP 2017 ; https://hal.archives-ouvertes.fr/hal-01736408 ; Biomedical NLP Workshop associated with RANLP 2017, Sep 2017, Varna, Bulgaria (2017)
|
|
BASE
|
|
Show details
|
|
4 |
Generating and executing complex natural language queries across linked data
|
|
|
|
In: International Congress on Medical Informatics ; https://hal.archives-ouvertes.fr/hal-01971222 ; International Congress on Medical Informatics, Jan 2015, Sao Paulo, Brazil (2015)
|
|
BASE
|
|
Show details
|
|
5 |
Tuning HeidelTime for identifying time expressions in clinical texts in English and French
|
|
|
|
In: International Workshop on Health Text Mining and Information Analysis ; https://hal.archives-ouvertes.fr/hal-01972761 ; International Workshop on Health Text Mining and Information Analysis, Jan 2014, Gothenburg, Sweden (2014)
|
|
BASE
|
|
Show details
|
|
6 |
Combining an expert-based medical entity recognizer to a machine-learning system: methods and a case-study
|
|
|
|
In: Biomedical Informatics Insights ; https://hal.archives-ouvertes.fr/hal-01972779 ; Biomedical Informatics Insights, 2013, 13p (2013)
|
|
BASE
|
|
Show details
|
|
|
|