1 |
The Danish Gigaword Project ...
|
|
Strømberg-Derczynski, Leon; Ciosici, Manuel R.; Baglini, Rebekah; Christiansen, Morten H.; Dalsgaard, Jacob Aarup; Fusaroli, Riccardo; Henrichsen, Peter Juel; Hvingelby, Rasmus; Kirkedal, Andreas; Kjeldsen, Alex Speed; Ladefoged, Claus; Nielsen, Finn Årup; Petersen, Malte Lau; Rystrøm, Jonathan Hvithamar; Varab, Daniel. - : arXiv, 2020
|
|
Abstract:
Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers' socio-economic status, and Danish dialects. ... : Identical to the NoDaLiDa 2021 version ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2005.03521 https://arxiv.org/abs/2005.03521
|
|
BASE
|
|
Hide details
|
|
|
|