4 |
gaBERT -- an Irish Language Model ...
|
|
|
|
Abstract:
The BERT family of neural language models have become highly popular due to their ability to provide sequences of text with rich context-sensitive token encodings which are able to generalise well to many Natural Language Processing tasks. Over 120 monolingual BERT models covering over 50 languages have been released, as well as a multilingual model trained on 104 languages. We introduce, gaBERT, a monolingual BERT model for the Irish language. We compare our gaBERT model to multilingual BERT and show that gaBERT provides better representations for a downstream parsing task. We also show how different filtering criteria, vocabulary size and the choice of subword tokenisation model affect downstream performance. We release gaBERT and related code to the community. ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2107.12930 https://arxiv.org/abs/2107.12930
|
|
BASE
|
|
Hide details
|
|
5 |
Annotating verbal MWEs in Irish for the PARSEME Shared Task 1.2
|
|
|
|
In: Walsh, Abigail, Lynn, Teresa and Foster, Jennifer orcid:0000-0002-7789-4853 (2020) Annotating verbal MWEs in Irish for the PARSEME Shared Task 1.2. In: Joint Workshop on Multiword Expressions and Electronic Lexicons, 13 Dec 2020, Barcelona, Spain (Online). (2020)
|
|
BASE
|
|
Show details
|
|
6 |
Edition 1.2 of the PARSEME Shared Task on Semi-supervised Identification of Verbal Multiword Expressions
|
|
|
|
In: Joint Workshop on Multiword Expressions and Electronic Lexicons (MWE-LEX 2020) ; https://hal.archives-ouvertes.fr/hal-03014927 ; Joint Workshop on Multiword Expressions and Electronic Lexicons (MWE-LEX 2020), 2020, Barcelona, Spain ; https://www.aclweb.org/anthology/volumes/2020.mwe-1/ (2020)
|
|
BASE
|
|
Show details
|
|
7 |
Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Annotated corpora and tools of the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Annotating Verbal MWEs in Irish for the PARSEME Shared Task 1.2 ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
|
|
|
|
In: Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018) ; https://hal.archives-ouvertes.fr/hal-01865575 ; Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Aug 2018, Santa Fe, United States. pp.222 - 240 (2018)
|
|
BASE
|
|
Show details
|
|
16 |
Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions
|
|
|
|
In: Parra Escartín, Carla orcid:0000-0002-8412-1525 and Walsh, Abigail (2018) Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions. In: Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), 25-26 Aug 2018, Santa Fe, NM, USA. (2018)
|
|
BASE
|
|
Show details
|
|
17 |
Annotated corpora and tools of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (edition 1.1)
|
|
|
|
BASE
|
|
Show details
|
|
|
|