1 |
CAS: corpus of clinical cases in French
|
|
|
|
In: ISSN: 2041-1480 ; Journal of Biomedical Semantics ; https://hal.archives-ouvertes.fr/hal-03021064 ; Journal of Biomedical Semantics, BioMed Central, 2020, ⟨10.1186/s13326-020-00225-x⟩ (2020)
|
|
Abstract:
International audience ; Background: Textual corpora are extremely important for various NLP applications as they provide information necessary for creating, setting and testing those applications and the corresponding tools. They are also crucial for designing reliable methods and reproducible results. Yet, in some areas, such as the medical area, due to confidentiality or to ethical reasons, it is complicated or even impossible to access representative textual data. We propose the CAS corpus built with clinical cases, such as they are reported in the published scientific literature in French. Results: Currently, the corpus contains 4,900 clinical cases in French, totaling nearly 1.7M word occurrences. Some clinical cases are associated with discussions. A subset of the whole set of cases is enriched with morpho-syntactic (PoS-tagging, lemmatization) and semantic (the UMLS concepts, negation, uncertainty) annotations. The corpus is being continuously enriched with new clinical cases and annotations. The CAS corpus has been compared with similar clinical narratives. When computed on tokenized and lowercase words, the Jaccard index indicates that the similarity between clinical cases and narratives reaches up to 0.9727. Conclusion: We assume that the CAS corpus can be effectively exploited for the development and testing of NLP tools and methods. Besides, the corpus will be used in NLP challenges and distributed to the research community.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; Corpus with clinical cases; Medical area; Morpho-syntactic and semantic annotation; Natural language processing; Reproducibility; Sustainability
|
|
URL: https://hal.archives-ouvertes.fr/hal-03021064/document https://doi.org/10.1186/s13326-020-00225-x https://hal.archives-ouvertes.fr/hal-03021064 https://hal.archives-ouvertes.fr/hal-03021064/file/s13326-020-00225-x.pdf
|
|
BASE
|
|
Hide details
|
|
2 |
CAS: French Corpus with Clinical Cases
|
|
|
|
In: LOUHI 2018 - The Ninth International Workshop on Health Text Mining and Information Analysis ; https://hal.archives-ouvertes.fr/hal-01937096 ; LOUHI 2018 - The Ninth International Workshop on Health Text Mining and Information Analysis, Oct 2018, Bruxelles, France. pp.1-7 (2018)
|
|
BASE
|
|
Show details
|
|
3 |
Strategies to select examples for Active Learning with Conditional Random Fields
|
|
|
|
In: CICLing 2017 - 18th International Conference on Computational Linguistics and Intelligent Text Processing ; https://hal.archives-ouvertes.fr/hal-01621338 ; CICLing 2017 - 18th International Conference on Computational Linguistics and Intelligent Text Processing, Apr 2017, Budapest, Hungary. pp.1-14 (2017)
|
|
BASE
|
|
Show details
|
|
4 |
RePaLi participation to CLEF eHealth IR challenge 2014: leveraging term variation
|
|
|
|
In: Proc of Conference and Labs of the Evaluation Forum CLEF ; Conference and Labs of the Evaluation Forum CLEF ; https://hal.archives-ouvertes.fr/hal-01027534 ; Conference and Labs of the Evaluation Forum CLEF, Sep 2014, Sheffield, United Kingdom. 13 p (2014)
|
|
BASE
|
|
Show details
|
|
5 |
Generating and using probabilistic morphological resources for the biomedical domain
|
|
|
|
In: Proceedings of the 9th edition of the Language Resources and Evaluation Conference, LREC 2014 ; 9th edition of the Language Resources and Evaluation Conference, LREC 2014 ; https://hal.archives-ouvertes.fr/hal-01027778 ; 9th edition of the Language Resources and Evaluation Conference, LREC 2014, May 2014, Reykjavik, Iceland. 7 p (2014)
|
|
BASE
|
|
Show details
|
|
6 |
Improving distributional thesauri by exploring the graph of neighbors
|
|
|
|
In: Proc of 25th International Conference on Computational Linguistics, COLING 2014 ; International Conference on Computational Linguistics, COLING 2014 ; https://hal.archives-ouvertes.fr/hal-01027545 ; International Conference on Computational Linguistics, COLING 2014, Aug 2014, Dublin, Ireland. 12 p (2014)
|
|
BASE
|
|
Show details
|
|
7 |
Automatic Acquisition of GL Resources, Using an Explanatory, Symbolic Technique
|
|
|
|
In: Advances in Generative Lexicon Theory ; https://hal.archives-ouvertes.fr/hal-00760258 ; Advances in Generative Lexicon Theory, Springer, pp.ch19, 2013 (2013)
|
|
BASE
|
|
Show details
|
|
8 |
Proper Noun Semantic Clustering using Bag-Of-Vectors
|
|
|
|
In: Proceedings of the Applied Natural Language Processing (ANLP) conference. Special track at the 25th International FLAIRS Conference ; ANLP - Applied Natural Language Processing conference. Special track at the 25th International FLAIRS Conference. ; https://hal.archives-ouvertes.fr/hal-00760105 ; ANLP - Applied Natural Language Processing conference. Special track at the 25th International FLAIRS Conference., May 2012, Marco Island, FL, United States (2012)
|
|
BASE
|
|
Show details
|
|
9 |
Using shallow linguistic features for relation extraction in bio-medical texts
|
|
|
|
In: Actes de la conférence TALN ; Traitement Automatique des Langues Naturelles, TALN ; https://hal.archives-ouvertes.fr/hal-00644070 ; Traitement Automatique des Langues Naturelles, TALN, 2011, Montpellier, France. 125-130, short paper (2011)
|
|
BASE
|
|
Show details
|
|
10 |
Translation of Biomedical Terms by Inferring Rewriting Rules
|
|
|
|
In: Information Retrieval in Biomedicine: Natural Language Processing for Knowledge Integration ; https://hal.inria.fr/hal-00843785 ; Violaine Prince and Mathieu Roche. Information Retrieval in Biomedicine: Natural Language Processing for Knowledge Integration, IGI Global, pp.106-123, 2009, ⟨10.4018/978-1-60566-274-9.ch006⟩ (2009)
|
|
BASE
|
|
Show details
|
|
11 |
Letter-to-phoneme conversion by inference of rewriting rules
|
|
|
|
In: Interspeech ; https://hal.inria.fr/hal-00844000 ; Interspeech, ISCA, Sep 2009, Brighton, United Kingdom. pp.1299-1302 ; http://www.isca-speech.org/archive/archive_papers/interspeech_2009/papers/i09_1299.pdf (2009)
|
|
BASE
|
|
Show details
|
|
12 |
Language modeling for bag-of-visual words image categorization
|
|
|
|
In: Proceedings of the 2008 International Conference on Image and Video Retrieval (CIVR'08) ; International Conference on Image and Video Retrieval ; https://hal.archives-ouvertes.fr/hal-00811922 ; International Conference on Image and Video Retrieval, Jul 2008, Niagara Falls, Canada. pp.249-258, ⟨10.1145/1386352.1386388⟩ (2008)
|
|
BASE
|
|
Show details
|
|
13 |
Automatic acquisition of semantic lexicons for information retrieval ; Acquisition automatique de lexiques sémantiques pour la recherche d'information
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-00524646 ; Interface homme-machine [cs.HC]. Université Rennes 1, 2003. Français (2003)
|
|
BASE
|
|
Show details
|
|
|
|