DE eng

Search in the Catalogues and Directories

Hits 1 – 1 of 1

1
Corpus of Croatian news portals ENGRI (2014-2018)
Bogunović, Irena; Kučić, Mario; Ljubešić, Nikola; Erjavec, Tomaž. - : University of Rijeka, Faculty of Maritime Studies, 2021
Abstract: The corpus consists of texts collected from the most popular (based on the Reuters Institute Digital News Report for 2018, retrieved from http://www.digitalnewsreport.org in April, 2019) news portals in Croatia in the period from 2014 to 2018: Direktno, Dnevno, Net Hr, Hrt, Index_Hr, Jutarnji, Novilist, Rtl, SlobodnaDalmacija, Večernji, Tportal, Dnevnik. Web browsing and web crawling were used to select and store the texts with their useful HTML information (publication date of the article, its URL, and title). The linguistic processing of the corpus was performed with the CLASSLA package (https://pypi.org/project/classla/) on the levels of tokenization, sentence splitting, morphosyntactic tagging, lemmatization, dependency parsing and named entity recognition. This corpus is a linguistically-processed version of the original corpus published at https://repository.pfri.uniri.hr/islandora/object/pfri%3A2156 and is distributed in the CoNLL-U format (https://universaldependencies.org/format.html).
Keyword: contemporary language; news corpus
URL: http://hdl.handle.net/11356/1416
BASE
Hide details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
1
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern