DE eng

Search in the Catalogues and Directories

Hits 1 – 1 of 1

1
2009 10th International Conference on Document Analysis and Recognition High Performance Chinese/English Mixed OCR with Character Level Language Identification
In: http://www.cvc.uab.es/icdar2009/papers/3725a406.pdf
Abstract: Currently, there have been several high performance OCR products for Chinese or for English. However, no one OCR technique can be simultaneously fit for both the English and the Chinese due to the large differences between Chinese and English. On the other hand, Chinese/English mixed document increases drastically with the globalization, so it is rather important to study the Chinese/English mixed document processing. Obviously, the key problem to resolve is how to split the mixed document into two parts: Chinese part and English part, so that the different OCR techniques can be applied to different parts. To further improve the previous system performance, a novel Chinese/English split algorithm based on global information is proposed and a rule for language identification is achieved by Bayesian formula. Experiment shows, the system error rate drops from 1.52 % to 0.87 % on magazine samples and from 1.32 % to 0.75 % on book samples, more than 2/5 of errors are excluded, which provides an experimental support for our research work. 1.
URL: http://www.cvc.uab.es/icdar2009/papers/3725a406.pdf
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.212.486
BASE
Hide details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
1
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern