Paper
21 July 2024 Chinese variant character restoration based on multidimensional features and local attention
Lingjun Meng, Hongchang Chen, Gengrun Wang
Author Affiliations +
Proceedings Volume 13219, Fourth International Conference on Applied Mathematics, Modelling, and Intelligent Computing (CAMMIC 2024); 132191U (2024) https://doi.org/10.1117/12.3036899
Event: 4th International Conference on Applied Mathematics, Modelling and Intelligent Computing (CAMMIC 2024), 2024, Kaifeng, China
Abstract
The restoration of Chinese variant characters refers to the process of converting variant characters in a text to the standard form of Chinese characters. Existing methods mainly treat the variant character restoration as a text error correction or machine translation task. Due to the diversity of variant character forms and the fact that many Chinese characters are visually or phonetically similar but have distinct meanings, Chinese variant character restoration presents challenges. However, traditional text correction models are used to handle aligned sequences, which cannot be directly applied to the variant character restoration. Moreover, using machine translation models for restoration results in low efficiency of restoration and restoration errors. To address these issues, the variant-T model based on multi-dimensional features and local attention is proposed in this paper. The multidimensional word embedding based on the RoCBERT is used to extract multi-dimensional features of text. Further, a model based on local attention is employed, and a local masking mechanism is proposed, which effectively the problem of incorrect restoration. Experimental results show that the word error rates are reduced to 0.51% and 0.62%, and the F1 scores reach 85.2% and 85.7% on the SIGHAN14 and SIGHAN15 datasets. Furthermore, the word error rate is reduced to 0.2% on the variant SMS datasets, which outperforms other models.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Lingjun Meng, Hongchang Chen, and Gengrun Wang "Chinese variant character restoration based on multidimensional features and local attention", Proc. SPIE 13219, Fourth International Conference on Applied Mathematics, Modelling, and Intelligent Computing (CAMMIC 2024), 132191U (21 July 2024); https://doi.org/10.1117/12.3036899
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Data modeling

Error control coding

Performance modeling

Education and training

Semantics

Feature extraction

Back to Top