1 |
A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT) [<Journal>]
|
|
|
|
DNB Subject Category Language
|
|
Show details
|
|
2 |
A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT)
|
|
|
|
In: ISSN: 1574-020X ; EISSN: 1574-0218 ; Language Resources and Evaluation ; https://hal.archives-ouvertes.fr/hal-01631743 ; Language Resources and Evaluation, Springer Verlag, 2017, 52 (2), pp.571-601. ⟨10.1007/s10579-017-9382-y⟩ (2017)
|
|
Abstract:
International audience ; Quality annotated resources are essential for Natural Language Processing. The objective of this work is to present a corpus of clinical narratives in French annotated for linguistic, semantic and structural information, aimed at clinical information extraction. Six annotators contributed to the corpus annotation, using a comprehensive annotation scheme covering 21 entities, 11 attributes and 37 relations. All annotators trained on a small, common portion of the corpus before proceeding independently. An automatic tool was used to produce entity and attribute pre-annotations. About a tenth of the corpus was doubly annotated and annotation differences were resolved in consensus meetings. To ensure annotation consistency throughout the corpus, we devised harmonization tools to automatically identify annotation differences to be addressed to improve the overall corpus quality. The annotation project spanned over 24 months and resulted in a corpus comprising 500 documents (148,476 tokens) annotated with 44,740 entities and 26,478 relations. The average inter-annotator agreement is 0.793 F-measure for entities and 0.789 for relations. The performance of the pre-annotation tool for entities reached 0.814 F-measure when sufficient training data was available. The performance of our entity pre-annotation tool shows the value of the corpus to build and evaluate information extraction methods. In addition, we introduced harmonization methods that further improved the quality of annotations in the corpus.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; Clinical narrative; Inter-annotator agreement; Personal health information; Semantic annotations
|
|
URL: https://doi.org/10.1007/s10579-017-9382-y https://hal.archives-ouvertes.fr/hal-01631743/file/lre.pdf https://hal.archives-ouvertes.fr/hal-01631743 https://hal.archives-ouvertes.fr/hal-01631743/document
|
|
BASE
|
|
Hide details
|
|
3 |
Annotation Guidelines BIONLP-ST 2016 SeeDev task
|
|
|
|
In: https://hal.inrae.fr/hal-02795594 ; 2016, pp.47 ; https://sites.google.com/site/bionlpst2016/tasks/seedev/AnnotationGuidelinesBIONLP-ST2016-SeeDevtask%285%29.pdf (2016)
|
|
BASE
|
|
Show details
|
|
4 |
Shared Task SeeDev : Extraction de régulations impliquées dans le développement de la graine d’Arabidopsis thaliana à partir de publications scientifiques
|
|
|
|
In: Les journées Bioinformatique de l'Inra ; https://hal.archives-ouvertes.fr/hal-01455855 ; Les journées Bioinformatique de l'Inra, Mar 2016, Montpellier, France. pp.1 (2016)
|
|
BASE
|
|
Show details
|
|
5 |
Overview of the regulatory network of plant seed development (SeeDev) task at the BioNLP shared task 2016
|
|
|
|
In: Proceedings of the 4th BioNLP Shared Task Workshop ; BioNLP Shared Task ; https://hal.archives-ouvertes.fr/hal-01455854 ; BioNLP Shared Task, Aug 2016, Berlin, Germany. pp.113 ; https://www.aclweb.org/anthology/W/W16/W16-3001.pdf (2016)
|
|
BASE
|
|
Show details
|
|
6 |
OntoBiotope, technologies sémantiques pour l’étude des habitats microbiens
|
|
|
|
In: Méta-omiques des Ecosystèmes Microbiens ; https://hal.inrae.fr/hal-02889012 ; Méta-omiques des Ecosystèmes Microbiens, Sep 2015, Paris, France (2015)
|
|
BASE
|
|
Show details
|
|
7 |
Information extraction from articles for the elaboration of the regulatory networks involved in Arabidopsis seed development
|
|
|
|
In: 26th International Conference on Arabidopsis Research ; https://hal.archives-ouvertes.fr/hal-01524850 ; 26th International Conference on Arabidopsis Research, Jul 2015, Paris, France (2015)
|
|
BASE
|
|
Show details
|
|
8 |
Knowledge model of regulatory networks involved in Arabidopsis seed development for information extraction and integration from text
|
|
|
|
In: BioCreative 5 ; https://hal.archives-ouvertes.fr/hal-01512197 ; BioCreative 5, Sep 2015, Madrid, Spain (2015)
|
|
BASE
|
|
Show details
|
|
9 |
Information extraction from articles for the elaboration of the regulatory networks involved in Arabidopsis seed development ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Information Extraction Challenge Gene Regulation Network in Arabidopsis thaliana (GRNA) ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Automatic computation of CHA2DS2-VASc score: information extraction from clinical texts for thromboembolism risk assessment.
|
|
|
|
In: AMIA . Annual Symposium proceedings [electronic resource] / AMIA Symposium. AMIA Symposium. ; https://hal.archives-ouvertes.fr/hal-00748588 ; AMIA . Annual Symposium proceedings [electronic resource] / AMIA Symposium. AMIA Symposium., 2011, 2011, pp.501-10 (2011)
|
|
BASE
|
|
Show details
|
|
13 |
Automatic computation of CHA2DS2-VASc score: Information extraction from clinical texts for thromboembolism risk assessment
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Named and specific entity detection in varied data: the Quaero named entity baseline evaluation
|
|
|
|
In: 7. Conference on international language resources and evaluation ; https://hal.inrae.fr/hal-02754184 ; 7. Conference on international language resources and evaluation, May 2010, Valletta, Malta ; http://www.lrec-conf.org/proceedings/lrec2010/index.html (2010)
|
|
BASE
|
|
Show details
|
|
15 |
Named and specific entity detection in varied data: The Quaero Named Entity baseline evaluation
|
|
|
|
In: Seventh International Language Resources and Evaluation (LREC'10) ; https://hal.archives-ouvertes.fr/hal-02496836 ; Seventh International Language Resources and Evaluation (LREC'10), 2010, La Valette, Malta (2010)
|
|
BASE
|
|
Show details
|
|
16 |
Morphosemantic parsing of medical compound words: Transferring a French analyzer to English
|
|
|
|
In: ISSN: 1386-5056 ; International Journal of Medical Informatics ; https://hal.archives-ouvertes.fr/hal-00413362 ; International Journal of Medical Informatics, Elsevier, 2009, 78 (1), pp.48-55 (2009)
|
|
BASE
|
|
Show details
|
|
17 |
Paraphrase Acquisition from Comparable Medical Corpora of Specialized and Lay Texts
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Analyse morphosémantique des composés savants : transposition du français à l'anglais
|
|
|
|
In: TALN ; https://halshs.archives-ouvertes.fr/halshs-00157869 ; TALN, Jun 2007, Toulouse, France. pp.79-88 (2007)
|
|
BASE
|
|
Show details
|
|
19 |
Defining Medical Words : Transposing Morphosemantic Analysis from French to English
|
|
|
|
In: Proceedings of the 12th Medinfo Conference ; 12th International Conference MEDINFO ; https://halshs.archives-ouvertes.fr/halshs-00157857 ; 12th International Conference MEDINFO, Aug 2007, Brisbane, Australia. pp.535-539 (2007)
|
|
BASE
|
|
Show details
|
|
|
|