Paper
27 March 2024 Real-time arbitrarily-shaped scene text spotting with weakly supervised points
Longyang Zhao, Shuhao Zhang, Zhi Xu
Author Affiliations +
Proceedings Volume 13105, International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2023); 131053Q (2024) https://doi.org/10.1117/12.3026683
Event: 3rd International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2023), 2023, Qingdao, China
Abstract
Recently, there has been widespread interest in arbitrary-shaped scene text spotting, showing notable progress in both speed and accuracy. However, common issues such as model complexity, slow inference speed, and challenges in deployment persist. For this reason, this paper proposes Weakly Supervised Point-collection Network (WSPNet) for arbitrarily shaped scene text, which adopts deformable convolutional single-shot network architecture to realize multi-branch parallel reading of text, and combines weakly supervised text line point sampling to realize sampling recognition and transcription of arbitrarily shaped text instances. To enhance the robustness of multi-scale fusion, we utilize the Text Feature Fusion (TFF) module to dynamically acquire key features at different scales during the fusion process. This approach aims to reduce discrepancies in scale fusion. Extensive experiments on arbitrarily shaped benchmarks show that WSPNet achieves competitive accuracy and speed. For example, the proposed method achieves an end-to-end text recognition F-measure of 63.4 on the Total-Text dataset at 35.5 FPS, which is competitive with the best results in terms of speed and accuracy balance.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Longyang Zhao, Shuhao Zhang, and Zhi Xu "Real-time arbitrarily-shaped scene text spotting with weakly supervised points", Proc. SPIE 13105, International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2023), 131053Q (27 March 2024); https://doi.org/10.1117/12.3026683
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Feature fusion

Education and training

Convolution

Deformation

Network architectures

Feature extraction

Image segmentation

Back to Top