Paper
7 March 1996 Model-based restoration of document images for OCR
Mysore Y. Jaisimha, Eve A. Riskin, Richard E. Ladner, Werner Stuetzle
Author Affiliations +
Proceedings Volume 2660, Document Recognition III; (1996) https://doi.org/10.1117/12.234711
Event: Electronic Imaging: Science and Technology, 1996, San Jose, CA, United States
Abstract
This paper presents a methodology for model based restoration of degraded document imagery. The methodology has the advantages of being able to adapt to nonuniform page degradations and of being based on a model of image defects that is estimated directly from a set of calibrating degraded document images. Further, unlike other global filtering schemes, our methodology filters only words that have been misspelled by the OCR with a high probability. In the first stage of the process, we extract a training sample of candidate misspelled word subimages from the set of calibration images before and after the degradation that we wish to undo. These word subimages are registered to extract defect pixels. The second stage of our methodology uses a vector quantization based algorithm to construct a summary model of the defect pixels. The final stage of the algorithm uses the summary model to restore degraded document images. We evaluate the performance of the methodology for a variety of parameter settings on a real world sample of degraded FAX transmitted documents. The methodology eliminates up to 56.4% of the OCR character errors introduced as a result of FAX transmission for our sample experiment.
© (1996) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Mysore Y. Jaisimha, Eve A. Riskin, Richard E. Ladner, and Werner Stuetzle "Model-based restoration of document images for OCR", Proc. SPIE 2660, Document Recognition III, (7 March 1996); https://doi.org/10.1117/12.234711
Lens.org Logo
CITATIONS
Cited by 7 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Calibration

Distortion

Image registration

Data modeling

Image classification

Image filtering

Back to Top