Paper
27 March 2024 A silent speech reconstruction method based on Res-Conformer model
Haofan Qi, Deli Fu, Hanlin Qi, Weiping Hu
Author Affiliations +
Proceedings Volume 13105, International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2023); 131052K (2024) https://doi.org/10.1117/12.3026353
Event: 3rd International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2023), 2023, Qingdao, China
Abstract
In response to the issues of low recognition accuracy in silent electromyographic facial action speech reconstruction tasks, this paper combines residual network (ResNet) and Transformer model to design a Res Transformer model based on ResNet structure and Transformer network. The model consists of three connected ResNet structures and several Transformer modules, and features are extracted using ResNet, The Transformer structure converts the electromyographic signal into a Mel frequency spectrum of 80 frequency bands, and sends the Mel frequency spectrum into the HiFiGAN network for speech reconstruction, ultimately obtaining the audible speech signal under silent action. In addition, our work also integrates acoustic speech signals, extracting the Mel frequency spectrum of synchronous acoustic speech signals as features, fully utilizing the time-frequency domain features of electromyography and speech signals. The experimental results show that the proposed Res Conform model has a phoneme recognition rate of 91.86% and a word error rate of 25.6%. Compared with using structures such as Transformer and LSTM, the Res-Conformer model has achieved effective improvement in recognition accuracy.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Haofan Qi, Deli Fu, Hanlin Qi, and Weiping Hu "A silent speech reconstruction method based on Res-Conformer model", Proc. SPIE 13105, International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2023), 131052K (27 March 2024); https://doi.org/10.1117/12.3026353
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Electromyography

Signal processing

Feature extraction

Performance modeling

Signal attenuation

Speech recognition

Back to Top