Since web born-digital images have low resolution and dense text atoms, text region over-merging and miss detection are still two open issues to be addressed. In this paper a novel iterative algorithm is proposed to locate and segment text regions. In each iteration, the candidate text regions are generated by detecting Maximally Stable Extremal Region (MSER) with diminishing thresholds, and categorized into different groups based on a new similarity graph, and the texted region groups are identified by applying several features and rules. With our proposed overlap checking method the final well-segmented text regions are selected from these groups in all iterations. Experiments have been carried out on the web born-digital image datasets used for robust reading competition in ICDAR 2011 and 2013, and the results demonstrate that our proposed scheme can significantly reduce both the number of over-merge regions and the lost rate of target atoms, and the overall performance outperforms the best compared with the methods shown in the two competitions in term of recall rate and f-score at the cost of slightly higher computational complexity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.