DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 32

1
Towards Arabic Sentence Simplification via Classification and Generative Approaches ...
Khallaf, Nouran; Sharoff, Serge. - : arXiv, 2022
BASE
Show details
2
Overview of the Fourth BUCC Shared Task: Bilingual Dictionary Induction from Comparable Corpora
In: 13th Workshop on Building and Using Comparable Corpora (BUCC) ; https://hal.archives-ouvertes.fr/hal-03100822 ; 13th Workshop on Building and Using Comparable Corpora (BUCC), May 2020, Marseille, France. pp.6-13 (2020)
BASE
Show details
3
Know thy corpus! Robust methods for digital curation of Web corpora ...
Sharoff, Serge. - : arXiv, 2020
Abstract: This paper proposes a novel framework for digital curation of Web corpora in order to provide robust estimation of their parameters, such as their composition and the lexicon. In recent years language models pre-trained on large corpora emerged as clear winners in numerous NLP tasks, but no proper analysis of the corpora which led to their success has been conducted. The paper presents a procedure for robust frequency estimation, which helps in establishing the core lexicon for a given corpus, as well as a procedure for estimating the corpus composition via unsupervised topic models and via supervised genre classification of Web pages. The results of the digital curation study applied to several Web-derived corpora demonstrate their considerable differences. First, this concerns different frequency bursts which impact the core lexicon obtained from each corpus. Second, this concerns the kinds of texts they contain. For example, OpenWebText contains considerably more topical news and political argumentation ...
Keyword: Computation and Language cs.CL; FOS Computer and information sciences
URL: https://dx.doi.org/10.48550/arxiv.2003.06389
https://arxiv.org/abs/2003.06389
BASE
Hide details
4
Recognizing semantic relations by combining transformers and fully connected models
Roussinov, Dmitri; Sharoff, Serge; Puchnina, Nadezhda. - : European Language Resources Association (ELRA), 2020
BASE
Show details
5
A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora
In: International Conference on Language Resources and Evaluation ; https://hal.archives-ouvertes.fr/hal-01898362 ; International Conference on Language Resources and Evaluation, May 2018, Miyazaki, Japan (2018)
BASE
Show details
6
Functional text dimensions for the annotation of web corpora
In: Corpora. - Edinburgh : Univ. Press 13 (2018) 1, 65-95
BLLDB
Show details
7
Crowdsourcing for web genre annotation
Asheghi, Noushin Rezapour [Verfasser]; Sharoff, Serge [Verfasser]; Markert, Katja [Verfasser]. - Hannover : Gottfried Wilhelm Leibniz Universität Hannover, 2016
DNB Subject Category Language
Show details
8
Language Adaptation for Extending Post-Editing Estimates for Closely Related Languages
In: Prague Bulletin of Mathematical Linguistics , Vol 106, Iss 1, Pp 181-192 (2016) (2016)
BASE
Show details
9
MULTEXT-East non-commercial lexicons 4.0
Erjavec, Tomaž; Derzhanski, Ivan; Divjak, Dagmar. - : Jožef Stefan Institute, 2015
BASE
Show details
10
Document dissimilarity within and across languages: A benchmarking study
In: LLC. - Oxford : Oxford Univ. Press 29 (2014) 1, 6
OLC Linguistik
Show details
11
Languages for Specific Purposes in the Digital Era
BLLDB
UB Frankfurt Linguistik
Show details
12
Building and using comparable corpora
Sharoff, Serge [Herausgeber]; Fung, Pascale [Herausgeber]; Rapp, Reinhard [Herausgeber]. - 2013
DNB Subject Category Language
Show details
13
Corpus-based vocabulary lists for language learners for nine languages [<Journal>]
Kilgarriff, Adam [Verfasser]; Charalabopoulou, Frieda [Verfasser]; Gavrilidou, Maria [Verfasser].
DNB Subject Category Language
Show details
14
Building and Using Comparable Corpora
Sharoff, Serge; Rapp, Reinhard; Zweigenbaum, Pierre. - Berlin, Heidelberg : Springer Berlin Heidelberg, 2013
UB Frankfurt Linguistik
Show details
15
Building and using comparable corpora
Sharoff, Serge (Hrsg.). - Berlin [u.a.] : Springer, 2013
BLLDB
UB Frankfurt Linguistik
Show details
16
Building and using comparable corpora
BASE
Show details
17
Terminology Extraction, Translation Tools and Comparable Corpora: TTC concept, midterm progress and achieved results
In: LREC 2012 Workshop on Creating Cross-language Resources for Disconnected Languages and Styles (CREDISLAS) ; https://hal.archives-ouvertes.fr/hal-00819909 ; LREC 2012 Workshop on Creating Cross-language Resources for Disconnected Languages and Styles (CREDISLAS), May 2012, Istanbul, Turkey. 4 p (2012)
BASE
Show details
18
Genres on the Web : Computational Models and Empirical Studies
Mehler, Alexander; Sharoff, Serge; Santini, Marina. - Dordrecht : Springer Netherlands, 2011
UB Frankfurt Linguistik
Show details
19
User-centred Views on Terminology Extraction Tools: Usage Scenarios and Integration into MT and CAT Tools.
In: Actes du colloque Tralogy : Anticiper les technologies pour la traduction ; Tralogy I. Métiers et technologies de la traduction : quelles convergences pour l'avenir ? ; https://hal.archives-ouvertes.fr/hal-00818657 ; Tralogy I. Métiers et technologies de la traduction : quelles convergences pour l'avenir ?, Mar 2011, Paris, France. 10 p (2011)
BASE
Show details
20
Balancing form and function in corpus research
In: International journal of corpus linguistics. - Amsterdam [u.a.] : Benjamins 15 (2010) 3, 419-424
BLLDB
OLC Linguistik
Show details

Page: 1 2

Catalogues
5
1
4
0
3
0
0
Bibliographies
12
0
0
0
0
0
0
0
1
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
12
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern