30 August 2019 Contour-based character segmentation for printed Arabic text with diacritics
Khader Mohammad, Aziz Qaroush, Muna Ayyesh, Mahdi Washha, Ahmad Alsadeh, Sos Agaian
Author Affiliations +

Current developments in sensors open new possible uses across numerous real-life applications, including optical character recognition (OCR). An OCR system requires incorporation of text processing tools into the sensor functionality. The most critical stage in OCR systems is the segmentation stage. It refers to the challenge of subdividing a text image into characters, which can be individually processed using a classifier. The cursive nature of the Arabic script such as the existence of different shapes for each character according to its location in the word besides the existence of diacritics makes Arabic character segmentation a very challenging task. A robust offline character segmentation algorithm for printed Arabic text with diacritics is developed based on the contour extraction technique. The algorithm works through extracting the up-contour part of a word and then identifies the splitting areas of the word characters. Then a postprocessing stage is used to handle the over-segmentation problems that appear in the initial segmentation stage. The proposed scheme is benchmarked using the APTI dataset and a manually collected dataset consisting of image texts varying in font size, type, and style for more than 38,000 words. The experiments show that the proposed algorithm is able to segment Arabic words with diacritics with an average accuracy of 98.5%.

© 2019 SPIE and IS&T 1017-9909/2019/$28.00 © 2019 SPIE and IS&T
Khader Mohammad, Aziz Qaroush, Muna Ayyesh, Mahdi Washha, Ahmad Alsadeh, and Sos Agaian "Contour-based character segmentation for printed Arabic text with diacritics," Journal of Electronic Imaging 28(4), 043030 (30 August 2019). https://doi.org/10.1117/1.JEI.28.4.043030
Received: 7 March 2019; Accepted: 30 July 2019; Published: 30 August 2019
Lens.org Logo
Cited by 10 scholarly publications.
Get copyright permission  Get copyright permission on Copyright Marketplace
Image segmentation

Optical character recognition

Image processing algorithms and systems

Algorithm development

Detection and tracking algorithms

Image processing

Binary data

Back to Top