Home
Catalogue search
Refine your search:
Keyword
Creator / Publisher:
Gutehrlé, Nicolas (2)
Université Bourgogne Franche-Comté COMUE (UBFC)-Université Bourgogne Franche-Comté COMUE (UBFC) (2)
Université de Franche-Comté (UFC) (2)
Atanassova, Iana (1)
Centre de recherches interdisciplinaires et transculturelles - UFC (EA 3224) (CRIT) (1)
Edition, Littératures, Langages, Informatique, Arts, Didactique, Discours - UFC (EA 4661) (ELLIADD) (1)
LETHIER, Virginie (1)
Maison des Sciences de l'Homme et de l'Environnement Claude Nicolas Ledoux (MSHE) (1)
Université Bourgogne Franche-Comté COMUE (UBFC)-Université Bourgogne Franche-Comté COMUE (UBFC)-Centre National de la Recherche Scientifique (CNRS) (1)
Year
Medium
Type
BLLDB-Access
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 2 of 2
1
Logical Layout Analysis Applied to Historical Newspapers
Gutehrlé, Nicolas
;
Atanassova, Iana
In: https://hal.archives-ouvertes.fr/hal-03468972 ; 2021 (2021)
Abstract:
In recent years, libraries and archives led important digitisation campaigns that opened the access to vast collections of historical documents. While such documents are often available as XML ALTO documents, they lack information about their logical structure. In this paper, we address the problem of logical layout analysis applied to historical documents. We propose a method which is based on the study of a dataset in order to identify rules that assign logical labels to both block and lines of text from XML ALTO documents. Our dataset contains newspapers in French, published in the first half of the 20th century. The evaluation shows that our methodology performs well for the identification of first lines of paragraphs and text lines, with F1 above 0.9. The identification of titles obtains an F1 of 0.64. This method can be applied to preprocess XML ALTO documents in preparation for downstream tasks, and also to annotate largescale datasets to train machine learning and deep learning algorithms.
Keyword:
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
;
[SHS.HIST]Humanities and Social Sciences/History
;
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
URL:
https://hal.archives-ouvertes.fr/hal-03468972/file/9_Paper-2.pdf
https://hal.archives-ouvertes.fr/hal-03468972
https://hal.archives-ouvertes.fr/hal-03468972/document
BASE
Hide details
2
Cartographier les données textuelles des petites annonces du Salinois (1840-1939)
Gutehrlé, Nicolas
;
LETHIER, Virginie
In: Journée d'études Humaspatia ; https://hal.archives-ouvertes.fr/hal-02316862 ; Journée d'études Humaspatia, Feb 2019, Dijon, France (2019)
BASE
Show details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
0
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
0
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
2
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern