DE eng

Search in the Catalogues and Directories

Hits 1 – 1 of 1

1
Document Domain Randomization for Deep Learning Document Layout Extraction
In: Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR, September 5--10, Lausanne, Switzerland) ; https://hal.inria.fr/hal-03336444 ; Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR, September 5--10, Lausanne, Switzerland), Sep 2021, Lausanne, Switzerland. pp.497-513, ⟨10.1007/978-3-030-86549-8_32⟩ (2021)
Abstract: International audience ; We present document domain randomization (DDR), the first successful transfer of CNNs trained only on graphically rendered pseudo-paper pages to real-world document segmentation. DDR renders pseudo-document pages by modeling randomized textual and non-textual contents of interest, with userdefined layout and font styles to support joint learning of fine-grained classes. We demonstrate competitive results using our DDR approach to extract nine document classes from the benchmark CS-150 and papers published in two domains, namely annual meetings of Association for Computational Linguistics (ACL) and IEEE Visualization (VIS). We compare DDR to conditions of style mismatch, fewer or more noisy samples that are more easily obtained in the real world. We show that high-fidelity semantic information is not necessary to label semantic classes but style mismatch between train and test can lower model accuracy. Using smaller training samples had a slightly detrimental effect. Finally, network models still achieved high test accuracy when correct labels are diluted towards confusing labels; this behavior hold across several classes.
Keyword: [SCCO.COMP]Cognitive science/Computer science; behavior analysis; Deep neural network; Document domain randomization; Document layout; evaluation
URL: https://hal.inria.fr/hal-03336444/file/docrandomization.pdf
https://hal.inria.fr/hal-03336444
https://hal.inria.fr/hal-03336444/document
https://doi.org/10.1007/978-3-030-86549-8_32
BASE
Hide details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
1
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern