1 |
Easy-to-use combination of POS and BERT model for domain-specific and misspelled terms
|
|
|
|
In: NL4IA Workshop Proceedings ; https://hal.archives-ouvertes.fr/hal-03474696 ; NL4IA Workshop Proceedings, Nov 2021, Milan, Italy (2021)
|
|
Abstract:
International audience ; In this paper, we present BERT-POS, a simple method for encoding syntax into BERT embeddings without retraining or finetuning data, based on Part-Of-Speech (POS). Although fine-tuning is the most popular method to apply BERT models on domain datasets, it remains expensive in terms of training time, computing resources, training data selection and retraining frequency. Our alternative works at the preprocessing level and relies on POS tagging sentences. It gives interesting results for words similarity regarding out-of-vocabulary both in terms of domain-specific words and misspellings. More specifically, the experiments were done on French language, but we believe that they would be similar on others.
|
|
Keyword:
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; Language Models; Natural Language Processing; Out-of-Vocabulary Words; Part-Of-Speech; Semantic Similarity
|
|
URL: https://hal.archives-ouvertes.fr/hal-03474696 https://hal.archives-ouvertes.fr/hal-03474696/document https://hal.archives-ouvertes.fr/hal-03474696/file/paper132.pdf
|
|
BASE
|
|
Hide details
|
|
2 |
Differential Evaluation: a Qualitative Analysis of Natural Language Processing System Behavior Based Upon Data Resistance to Processing
|
|
|
|
In: Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems ; EVAL4NLP, 2nd Workshop on "Evaluation & Comparison of NLP Systems", EMNLP 2021 ; https://hal.archives-ouvertes.fr/hal-03432331 ; EVAL4NLP, 2nd Workshop on "Evaluation & Comparison of NLP Systems", EMNLP 2021, Nov 2021, Punta Cana, Dominican Republic (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Inference Annotation of a Chinese Corpus for Opinion Mining
|
|
|
|
In: LREC ; https://hal-inalco.archives-ouvertes.fr/hal-02507170 ; LREC, May 2020, Marseille, France (2020)
|
|
BASE
|
|
Show details
|
|
4 |
Automatic Removal of Identifying Information in Official EU Languages for Public Administrations: The MAPA Project
|
|
|
|
In: Legal Knowledge and Information Systems ; Frontiers in Artificial Intelligence and Applications ; International Conference on Legal Knowledge and Information Systems ; https://hal.archives-ouvertes.fr/hal-03058311 ; International Conference on Legal Knowledge and Information Systems, Dec 2020, Brno, Prague, Czech Republic. pp.223-226, ⟨10.3233/FAIA200869⟩ ; http://ebooks.iospress.nl/volume/legal-knowledge-and-information-systems-jurix-2020-the-thirty-third-annual-conference-brno-czech-republic-december-911-2020 (2020)
|
|
BASE
|
|
Show details
|
|
5 |
Inference Annotation of a Chinese Corpus for Opinion Mining
|
|
|
|
In: LREC ; https://hal-inalco.archives-ouvertes.fr/hal-02507170 ; LREC, May 2020, Marseille, France (2020)
|
|
BASE
|
|
Show details
|
|
6 |
A Year of Papers Using Biomedical Texts:: Findings from the Section on Clinical Natural Language Processing of the International Medical Informatics Association Yearbook
|
|
|
|
In: Yearb Med Inform (2020)
|
|
BASE
|
|
Show details
|
|
7 |
French Levothyrox® Crisis: Retrospective Analysis of Social Media
|
|
|
|
In: International Society of Pharmacovigilance ; https://hal.archives-ouvertes.fr/hal-02411632 ; International Society of Pharmacovigilance, Springer International Publishing, Oct 2019, Bogota, Colombia (2019)
|
|
BASE
|
|
Show details
|
|
8 |
Annotations d'entités et de relations sur des résumés d'articles scientifiques pour la détection d'interactions entre aliments et médicaments
|
|
|
|
In: TALMED 2019 ; https://hal.archives-ouvertes.fr/hal-02430510 ; TALMED 2019, Aug 2019, Lyon, France (2019)
|
|
BASE
|
|
Show details
|
|
9 |
Three Dimensions of Reproducibility in Natural Language Processing
|
|
|
|
BASE
|
|
Show details
|
|
10 |
A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT) [<Journal>]
|
|
|
|
DNB Subject Category Language
|
|
Show details
|
|
11 |
Generating a training corpus for OCR post-correction using encoder-decoder model
|
|
|
|
In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers) ; International Joint Conference on Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-01831147 ; International Joint Conference on Natural Language Processing, Nov 2017, Taipei, Taiwan ; https://www.aclweb.org/anthology/I17-1101 (2017)
|
|
BASE
|
|
Show details
|
|
12 |
CLEF eHealth 2017 Multilingual Information Extraction task Overview: ICD10 Coding of Death Certificates in English and French.
|
|
|
|
In: Workshop of the Cross-Language Evaluation Forum ; https://hal.archives-ouvertes.fr/hal-01665374 ; Workshop of the Cross-Language Evaluation Forum, CEUR-WS, Jan 2017, Dublin, Ireland (2017)
|
|
BASE
|
|
Show details
|
|
13 |
A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT)
|
|
|
|
In: ISSN: 1574-020X ; EISSN: 1574-0218 ; Language Resources and Evaluation ; https://hal.archives-ouvertes.fr/hal-01631743 ; Language Resources and Evaluation, Springer Verlag, 2017, 52 (2), pp.571-601. ⟨10.1007/s10579-017-9382-y⟩ (2017)
|
|
BASE
|
|
Show details
|
|
14 |
Identification of mentions and relations between bacteria and biotope from PubMed abstracts
|
|
|
|
In: BioNLP Shared-Task Workshop ; https://hal.archives-ouvertes.fr/hal-01831226 ; BioNLP Shared-Task Workshop, ACL, Jan 2016, Berlin, Germany (2016)
|
|
BASE
|
|
Show details
|
|
15 |
Low-resource OCR error detection and correction in French Clinical Texts
|
|
|
|
In: International Workshop on Health Text Mining and Information Analysis ; https://hal.archives-ouvertes.fr/hal-01831225 ; International Workshop on Health Text Mining and Information Analysis, ACL, Nov 2016, Austin, United States (2016)
|
|
BASE
|
|
Show details
|
|
16 |
Analyse des émotions, sentiments et opinions exprimés dans les tweets : présentation et résultats de l'édition 2015 du défi fouille de texte (DEFT)
|
|
|
|
In: Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles (TALN 2015) ; https://hal.archives-ouvertes.fr/hal-01617180 ; Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles (TALN 2015), Jun 2015, Caen, France ; http://www.atala.org/taln_archives/ateliers/2015/DEFT/deft-2015-long-001.pdf (2015)
|
|
BASE
|
|
Show details
|
|
17 |
Morpho-Syntactic Study of Errors from Speech Recognition System
|
|
|
|
In: International Conference on Language Resources and Evaluation ; https://hal.archives-ouvertes.fr/hal-01831243 ; International Conference on Language Resources and Evaluation, Jan 2014, Reykjavik, Iceland (2014)
|
|
BASE
|
|
Show details
|
|
18 |
Reformatting clinical records based on global layout statistics
|
|
|
|
In: International Symposium on Semantic Mining in Biomedicine ; https://hal.archives-ouvertes.fr/hal-01831245 ; International Symposium on Semantic Mining in Biomedicine, Jan 2014, Aveiro, Portugal (2014)
|
|
BASE
|
|
Show details
|
|
19 |
Human Annotation of ASR Error Regions: is "gravity" a Sharable Concept for Human Annotators?
|
|
|
|
In: Ninth International Conference on Language Resources and Evaluation (LREC'14) ; https://hal.archives-ouvertes.fr/hal-01134802 ; Ninth International Conference on Language Resources and Evaluation (LREC'14), May 2014, Reykjavik, Iceland. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pp.3050-3056, 2014 ; http://lrec2014.lrec-conf.org/en/ (2014)
|
|
BASE
|
|
Show details
|
|
20 |
Approches à base de fréquences pour la simplification lexicale
|
|
|
|
In: TALN-RECITAL 2013 ; https://hal.archives-ouvertes.fr/hal-00838354 ; TALN-RECITAL 2013, Jun 2013, Les Sables d'Olonne, France. pp.493-506 (2013)
|
|
BASE
|
|
Show details
|
|
|
|