1 |
Arabic Handwritten Documents Segmentation into Text-lines and Words using Deep Learning
|
|
|
|
In: ASAR ; https://hal.inria.fr/hal-02460880 ; ASAR, Sep 2019, Sydney, Australia (2019)
|
|
Abstract:
International audience ; One of the most important steps in a handwriting recognition system is text-line and word segmentation. But, this step is made difficult by the differences in handwriting styles, problems of skewness, overlapping and touching of text and the fluctuations of text-lines. It is even more difficult for ancient and calligraphic writings, as in Arabic manuscripts, due to the cursive connection in Arabic text, the erroneous position of diacritic marks, the presence of ascending and descending letters, etc. In this work, we propose an effective segmentation of Arabic handwritten text into text-lines and words, using deep learning. For text-line segmentation, we used an RU-net which allows a pixel-wise classification to separate text-lines pixels from the background ones. For word segmentation, we resorted to the text-line transcription, as we have not got a ground truth at word level. A BLSTM-CTC (Bidirectional Long Short Term Memory followed by a Connectionist Temporal Classification) is then used to perform the mapping between the transcription and text-line image, avoiding the need of the input segmentation. A CNN (Convolutional Neural Network) precedes the BLST-CTC to extract the features and to feed the BLSTM with the essential of the text-line image. Tested on the standard KHATT Arabic database, the experimental results confirm a segmentation success rate of no less than 96.7% for text-lines and 80.1% for words.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO]Computer Science [cs]
|
|
URL: https://hal.inria.fr/hal-02460880/file/LineSeg-ASAR2019.pdf https://hal.inria.fr/hal-02460880/document https://hal.inria.fr/hal-02460880
|
|
BASE
|
|
Hide details
|
|
2 |
Line and Word Segmentation of Arabic handwritten documents using Neural Networks ; Segmentation en lignes et en mots de documents arabes manuscrits utilisant des modèles neuronnaux
|
|
|
|
In: https://hal.inria.fr/hal-01910559 ; [Research Report] LORIA - Université de Lorraine; READ. 2018 (2018)
|
|
BASE
|
|
Show details
|
|
3 |
Named Entity Recognition by Neural Prediction
|
|
|
|
In: International Conference on Image Processing, Computer Vision, & Pattern Recognition ; https://hal.inria.fr/hal-01981613 ; International Conference on Image Processing, Computer Vision, & Pattern Recognition, Jul 2018, Las Vegas, United States (2018)
|
|
BASE
|
|
Show details
|
|
4 |
Impact of Features and Classifiers Combinations on the Performances of Arabic Recognition Systems
|
|
|
|
In: International Workshop on Arabic Script Analysis and Recognition ; https://hal.inria.fr/hal-01981528 ; International Workshop on Arabic Script Analysis and Recognition, Apr 2017, NANCY, France (2017)
|
|
BASE
|
|
Show details
|
|
5 |
Arabic Handwritten Words Off-line Recognition based on HMMs and DBNs
|
|
|
|
In: ICDAR 2015 - 13th International Conference on Document Analysis and Recognition ; https://hal.inria.fr/hal-01254724 ; ICDAR 2015 - 13th International Conference on Document Analysis and Recognition, Aug 2015, Nancy, France. pp.51 - 55, ⟨10.1109/ICDAR.2015.7333724⟩ (2015)
|
|
BASE
|
|
Show details
|
|
8 |
A Neural-Linguistic Approach for the Recognition of a Wide Arabic Word Lexicon
|
|
|
|
In: Document Recognition and Retrieval XVII ; https://hal.inria.fr/inria-00579680 ; Document Recognition and Retrieval XVII, Jan 2010, San Jose, United States (2010)
|
|
BASE
|
|
Show details
|
|
9 |
Automation of Indian Postal Documents written in Bangla and English
|
|
|
|
In: ISSN: 0218-0014 ; EISSN: 0218-0014 ; International Journal of Pattern Recognition and Artificial Intelligence ; https://hal.inria.fr/inria-00435501 ; International Journal of Pattern Recognition and Artificial Intelligence, World Scientific Publishing, 2009, 23 (8), pp.1599-1632. ⟨10.1142/S0218001409007776⟩ (2009)
|
|
BASE
|
|
Show details
|
|
10 |
Segmentation of Continuous Document Flow by a modified Backward- Forward algorithm
|
|
|
|
In: SPIE - Electronic Imaging ; https://hal.inria.fr/inria-00347217 ; SPIE - Electronic Imaging, 2009, Los Angeles, United States (2009)
|
|
BASE
|
|
Show details
|
|
11 |
A neural perceptive model for the recognition of a large canonical Arabic word vocabulary
|
|
|
|
In: International Arab Conference on Information Technology ; https://hal.inria.fr/inria-00600294 ; International Arab Conference on Information Technology, Dec 2009, Sana'a, Yemen (2009)
|
|
BASE
|
|
Show details
|
|
12 |
HMM and fuzzy logic: A hybrid approach for online Urdu script-based languages' character recognition
|
|
|
|
In: ISSN: 0950-7051 ; EISSN: 1872-7409 ; Knowledge-Based Systems ; https://hal.inria.fr/inria-00579697 ; Knowledge-Based Systems, Elsevier, 2009, 23 (8), pp.914-923. ⟨10.1016/j.knosys.2010.06.007⟩ (2009)
|
|
BASE
|
|
Show details
|
|
13 |
Effect of Ghost Character Theory on Arabic Script Based Languages Character Recognition
|
|
|
|
In: WASE Global Conference on Image Processing and Analysis - GCIA09 ; https://hal.inria.fr/inria-00579666 ; WASE Global Conference on Image Processing and Analysis - GCIA09, Feb 2009, Taiwan, China (2009)
|
|
BASE
|
|
Show details
|
|
14 |
Arabic natural language processing: handwriting recognition
|
|
|
|
In: International Arab Conference on Information Technology - ACIT'2008 ; https://hal.inria.fr/inria-00347229 ; International Arab Conference on Information Technology - ACIT'2008, Dec 2008, Hammamet, Tunisia (2008)
|
|
BASE
|
|
Show details
|
|
15 |
A Novel Approach for the Recognition of a wide Arabic Handwritten Word Lexicon
|
|
|
|
In: International Conference on Document Analysis and Recognition ; https://hal.archives-ouvertes.fr/hal-00347179 ; International Conference on Document Analysis and Recognition, Dec 2008, Tampa, United States. pp.ThAT6.5 (2008)
|
|
BASE
|
|
Show details
|
|
16 |
Cursive Bengali Script Recognition for Indian Postal Automation ; Reconnaissance de l'écriture manuscrite cursive Bengali pour l'automatisation de la Poste Indienne
|
|
|
|
BASE
|
|
Show details
|
|
17 |
A System for Indian Postal Automation
|
|
|
|
In: International Conference on Document Analysis and Recognition ; https://hal.inria.fr/inria-00000364 ; International Conference on Document Analysis and Recognition, Sep 2005, Seoul, Korea (2005)
|
|
BASE
|
|
Show details
|
|
18 |
A System for Indian Postal Automation
|
|
|
|
In: International Workshop on Document Analysis ; https://hal.inria.fr/inria-00000103 ; International Workshop on Document Analysis, Umapada Pal, Mar 2005, Kolkata, India (2005)
|
|
BASE
|
|
Show details
|
|
20 |
Du manuscrit à l'impression sans saisie
|
|
|
|
In: Colloque sur la vision artificielle - CVA'00 ; https://hal.inria.fr/inria-00099150 ; Colloque sur la vision artificielle - CVA'00, 2000, Tizi Ouzou, Algérie, 49 p (2000)
|
|
BASE
|
|
Show details
|
|
|
|