DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5...35
Hits 1 – 20 of 690

1
Between History and Natural Language Processing: Study, Enrichment and Online Publication of French Parliamentary Debates of the Early Third Republic (1881-1899)
In: ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora ; https://hal.archives-ouvertes.fr/hal-03623351 ; ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora, Jun 2022, Marseille, France ; https://www.clarin.eu/ParlaCLARIN-III (2022)
Abstract: International audience ; We present the AGODA (Analyse sémantique et Graphes relationnels pour l'Ouverture des Débats à l'Assemblée nationale) project, which aims to create a platform for consulting and exploring digitised French parliamentary debates (1881-1940) available in the digital library of the National Library of France. This project brings together historians and NLP specialists: parliamentary debates are indeed an essential source for French history of the contemporary period, but also for linguistics. This project therefore aims to produce a corpus of texts that can be easily exploited with computational methods, and that respect the TEI standard. Ancient parliamentary debates are also an excellent case study for the development and application of tools for publishing and exploring large historical corpora. In this paper, we present the steps necessary to produce such a corpus. We detail the processing and publication chain of these documents, in particular by mentioning the problems linked to the extraction of texts from digitised images. We also introduce the first analyses that we have carried out on this corpus with "bag-of-words" techniques not too sensitive to OCR quality (namely topic modelling and word embedding).
Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-CY]Computer Science [cs]/Computers and Society [cs.CY]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [SHS.HIST]Humanities and Social Sciences/History; France; OCR; Parliamentary debates; Third Republic; Topic modelling; Word embedding; XML-TEI
URL: https://hal.archives-ouvertes.fr/hal-03623351/document
https://hal.archives-ouvertes.fr/hal-03623351
https://hal.archives-ouvertes.fr/hal-03623351/file/puren_bourgeois_pellet_vernus_agoda2022.pdf
BASE
Hide details
2
Ensemble of Opinion Dynamics Models to Understand the Role of the Undecided in the Vaccination Debate ...
Lenti, Jacopo; Ruffo, Giancarlo. - : arXiv, 2022
BASE
Show details
3
Zum Ungleichgewicht digital vermittelten Sachunterrichts und sprachlich-kommunikativer Anforderungen ...
Kern, Friederike; Schwier, Volker; Stövesand, Björn. - : Verlag Julius Klinkhardt, 2022
BASE
Show details
4
Zum Ungleichgewicht digital vermittelten Sachunterrichts und sprachlich-kommunikativer Anforderungen
In: Sachunterricht in der Informationsgesellschaft. Bad Heilbrunn : Verlag Julius Klinkhardt 2022, S. 114-121. - (Probleme und Perspektiven des Sachunterrichts; 32) (2022)
BASE
Show details
5
Considerations for Multilingual Wikipedia Research ...
Johnson, Isaac; Lescak, Emily. - : arXiv, 2022
BASE
Show details
6
Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers ...
Vitiugin, Fedor; Castillo, Carlos. - : arXiv, 2022
BASE
Show details
7
MMTAfrica: Multilingual Machine Translation for African Languages ...
BASE
Show details
8
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers ...
Lees, Alyssa; Tran, Vinh Q.; Tay, Yi. - : arXiv, 2022
BASE
Show details
9
MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset ...
BASE
Show details
10
Korean Online Hate Speech Dataset for Multilabel Classification: How Can Social Science Improve Dataset on Hate Speech? ...
BASE
Show details
11
Quantifying knowledge synchronisation in the 21st century ...
BASE
Show details
12
An NLP Solution to Foster the Use of Information in Electronic Health Records for Efficiency in Decision-Making in Hospital Care ...
BASE
Show details
13
Networks and Identity Drive Geographic Properties of the Diffusion of Linguistic Innovation ...
BASE
Show details
14
Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study ...
BASE
Show details
15
Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations ...
BASE
Show details
16
Achieving Downstream Fairness with Geometric Repair ...
BASE
Show details
17
Towards Responsible Natural Language Annotation for the Varieties of Arabic ...
Bergman, A. Stevie; Diab, Mona T.. - : arXiv, 2022
BASE
Show details
18
Polling Latent Opinions: A Method for Computational Sociolinguistics Using Transformer Language Models ...
BASE
Show details
19
Who will share Fake-News on Twitter? Psycholinguistic cues in online post histories discriminate Between actors in the misinformation ecosystem ...
BASE
Show details
20
A Psycho-linguistic Analysis of BitChute ...
Horne, Benjamin D.. - : arXiv, 2022
BASE
Show details

Page: 1 2 3 4 5...35

Catalogues
7
0
1
0
0
0
0
Bibliographies
19
0
0
0
0
0
0
0
1
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
670
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern