Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	When is Wall a Pared and when a Muro?: Extracting Rules Governing Lexical Selection ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Anastasopoulos, Antonios; Chaudhary, Aditi. - : Underline Science Inc., 2021
	BASE
	Show details

2	Lexically-Aware Semi-Supervised Learning for OCR Post-Correction ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Anastasopoulos, Antonios; Neubig, Graham; Rijhwani, Shruti; Rosenblum, Daisy. - : Underline Science Inc., 2021
	Abstract: Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents. Optical character recognition (OCR) can be used to produce digitized text, and previous work has demonstrated the utility of neural post-correction methods that improve the results of general-purpose OCR systems on recognition of less-well-resourced languages. However, these methods rely on manually curated post-correction data, which are relatively scarce compared to the non-annotated raw images that need to be digitized. In this paper, we present a semi-supervised learning method that makes it possible to utilize these raw images to improve performance, specifically through the use of self-training, a technique where a model is iteratively trained on its own outputs. In addition, to enforce consistency in the recognized vocabulary, we introduce a lexically aware decoding method that augments the neural post-correction model with a count-based language model constructed from the ...
	Keyword: Computational Linguistics; Machine Learning; Machine Learning and Data Mining; Natural Language Processing
	URL: https://dx.doi.org/10.48448/fycy-h885 https://underline.io/lecture/38192-lexically-aware-semi-supervised-learning-for-ocr-post-correction
	BASE
	Hide details

Search in the Catalogues and Directories