5 |
CorCenCC: Corpws Cenedlaethol Cymraeg Cyfoes – the National Corpus of Contemporary Welsh ...
|
|
Knight, Dawn; Morris, Steve; Fitzpatrick, Tess; Rayson, Paul; Spasić, Irena; Thomas, Enlli Môn; Lovell, Alex; Morris, Jonathan; Evas, Jeremy; Stonelake, Mark; Arman, Laura; Davies, Josh; Ezeani, Ignatius; Neale, Steven; Needs, Jennifer; Piao, Scott; Rees, Mair; Watkins, Gareth; Williams, Lowri; Muralidaran, Vignesh; Tovey-Walsh, Bethan; Anthony, Laurence; Cobb, Thomas M; Deuchar, Margaret; Donnelly, Kevin; McCarthy, Michael; Scannell, Kevin. - : Cardiff University, 2020
|
|
Abstract:
The CorCenCC corpus contains over 11 million words (circa 14.4m tokens) from written, spoken and electronic (online, digital texts) Welsh language sources, taken from a range of genres, language varieties (regional and social) and contexts. The contributors to CorCenCC are representative of the over half a million Welsh speakers in the country. The creation of CorCenCC was a community-driven project, which offered users of Welsh an opportunity to be proactive in contributing to a Welsh language resource that reflects how Welsh is currently used. To make CorCenCC as representative of contemporary Welsh as possible, the project team designed a bespoke sampling framework. Extracts were collected from sources including for example, journals, emails, sermons, road signs, TV programmes, meetings, magazines and books. Conversations were recorded by the research team, and a specially designed crowdsourcing app (see: https://www.corcencc.org/app/) enabled Welsh speakers in the community to record and upload samples ...
|
|
Keyword:
Computational Linguistics; Computational/Corpus Linguistics; Language Corpora for ICT; Linguistics General
|
|
URL: https://dx.doi.org/10.17035/d.2020.0119878310 https://research.cardiff.ac.uk/converis/portal/detail/Dataset/119878310?auxfun=&lang=en_GB
|
|
BASE
|
|
Hide details
|
|
7 |
Code-switching in Irish tweets: a preliminary analysis
|
|
|
|
In: Lynn, Teresa and Scannell, Kevin orcid:0000-0003-4075-9524 (2019) Code-switching in Irish tweets: a preliminary analysis. In: Third Celtic Language Technology Workshop 2019, 19 Aug 2019, Dublin, Ireland. (2019)
|
|
BASE
|
|
Show details
|
|
9 |
Minority language Twitter: part-of-speech tagging and analysis of Irish Tweets
|
|
|
|
In: Lynn, Teresa, Scannell, Kevin and Maguire, Eimear (2015) Minority language Twitter: part-of-speech tagging and analysis of Irish Tweets. In: ACL 2015 Workshop on Noisy User-generated Text 2015 (W-NUT), 31 July 2015, Beijing, China. (2015)
|
|
BASE
|
|
Show details
|
|
12 |
Creating CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes - The National Corpus of Contemporary Welsh) [Online resource]
|
|
|
|
IDS-Repository
|
|
Show details
|
|
|
|