1 |
Between History and Natural Language Processing: Study, Enrichment and Online Publication of French Parliamentary Debates of the Early Third Republic (1881-1899)
|
|
|
|
In: ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora ; https://hal.archives-ouvertes.fr/hal-03623351 ; ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora, Jun 2022, Marseille, France ; https://www.clarin.eu/ParlaCLARIN-III (2022)
|
|
Abstract:
International audience ; We present the AGODA (Analyse sémantique et Graphes relationnels pour l'Ouverture des Débats à l'Assemblée nationale) project, which aims to create a platform for consulting and exploring digitised French parliamentary debates (1881-1940) available in the digital library of the National Library of France. This project brings together historians and NLP specialists: parliamentary debates are indeed an essential source for French history of the contemporary period, but also for linguistics. This project therefore aims to produce a corpus of texts that can be easily exploited with computational methods, and that respect the TEI standard. Ancient parliamentary debates are also an excellent case study for the development and application of tools for publishing and exploring large historical corpora. In this paper, we present the steps necessary to produce such a corpus. We detail the processing and publication chain of these documents, in particular by mentioning the problems linked to the extraction of texts from digitised images. We also introduce the first analyses that we have carried out on this corpus with "bag-of-words" techniques not too sensitive to OCR quality (namely topic modelling and word embedding).
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-CY]Computer Science [cs]/Computers and Society [cs.CY]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [SHS.HIST]Humanities and Social Sciences/History; France; OCR; Parliamentary debates; Third Republic; Topic modelling; Word embedding; XML-TEI
|
|
URL: https://hal.archives-ouvertes.fr/hal-03623351/document https://hal.archives-ouvertes.fr/hal-03623351 https://hal.archives-ouvertes.fr/hal-03623351/file/puren_bourgeois_pellet_vernus_agoda2022.pdf
|
|
BASE
|
|
Hide details
|
|
3 |
Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
MMTAfrica: Multilingual Machine Translation for African Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Korean Online Hate Speech Dataset for Multilabel Classification: How Can Social Science Improve Dataset on Hate Speech? ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
An NLP Solution to Foster the Use of Information in Electronic Health Records for Efficiency in Decision-Making in Hospital Care ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Networks and Identity Drive Geographic Properties of the Diffusion of Linguistic Innovation ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Towards Responsible Natural Language Annotation for the Varieties of Arabic ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Polling Latent Opinions: A Method for Computational Sociolinguistics Using Transformer Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Who will share Fake-News on Twitter? Psycholinguistic cues in online post histories discriminate Between actors in the misinformation ecosystem ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
How Hermeneutic Spirals may reduce Complexity to Narrative Schemata - expanding on "Complexity and the Userly Text"
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03254233 ; 2021 (2021)
|
|
BASE
|
|
Show details
|
|
18 |
Digital participation of left-wing activists in Brazil: cultural events as a cement to mobilization and networked protest
|
|
|
|
In: Brasiliana: Journal for Brazilian Studies ; https://hal.archives-ouvertes.fr/hal-03365831 ; Brasiliana: Journal for Brazilian Studies, 2021, 10 (1), pp.261-284. ⟨10.25160/bjbs.v10i1.125719⟩ (2021)
|
|
BASE
|
|
Show details
|
|
19 |
Influencer detection in social media ; Détection des influenceurs dans des médias sociaux
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03640442 ; Ordinateur et société [cs.CY]. Institut National des Langues et Civilisations Orientales- INALCO PARIS - LANGUES O', 2021. Français. ⟨NNT : 2021INAL0034⟩ (2021)
|
|
BASE
|
|
Show details
|
|
20 |
L’intelligence artificielle au risque du singulier ; L’intelligence artificielle au risque du singulier: Les limites du calcul des significations dans les technologies de la traduction
|
|
|
|
In: Qu'est-ce qui échappe à l'intelligence artificielle ? Colloque interdisciplinaire ; https://hal-utt.archives-ouvertes.fr/hal-03358046 ; Qu'est-ce qui échappe à l'intelligence artificielle ? Colloque interdisciplinaire, Laboratoire de sciences humaines de Polytechnique, Sep 2021, Palaiseau, France (2021)
|
|
BASE
|
|
Show details
|
|
|
|