Paper
20 September 2024 Controllable image subtitle generation in electric power construction scenes based on codec
Bing Zhu, Bin Zhu, Juan Li
Author Affiliations +
Proceedings Volume 13269, Fourth International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2024); 132690C (2024) https://doi.org/10.1117/12.3045647
Event: Fourth International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2024), 2024, Kuala Lumpur, Malaysia
Abstract
In the context of power construction, image caption generation leverages deep learning-based encoding and decoding techniques to comprehend image information and convert it into textual descriptions. This approach enhances traditional image analysis by providing preemptive safety warnings and diversifying the output formats. Conventional methods for image caption generation lack controllability and detailed descriptions, and there is a paucity of research focused on image descriptions in power construction scenarios. To address this, this paper proposes a controllable image caption generation optimization method based on an encoder-decoder architecture. A novel feature extraction model is introduced, utilizing the FVCR-CNN (faster and visual commonsense region-convolutional neural network) as the encoder to extract salient and visual commonsense features from the images. The activation function is improved to develop an enhanced M-tanh-based long short-term memory (MT-LSTM) neural network for feature decoding. Finally, a multi-branch decision strategy is employed to optimize the output. The proposed method was trained and tested on a dataset of power scene descriptions using the Ubuntu 16.04 and PyTorch deep learning framework. Experimental results demonstrate a significant improvement in the accuracy of image caption generation, as well as enhanced controllability of scene descriptions, thereby substantially improving the intelligent level of safety management at power construction sites.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Bing Zhu, Bin Zhu, and Juan Li "Controllable image subtitle generation in electric power construction scenes based on codec", Proc. SPIE 13269, Fourth International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2024), 132690C (20 September 2024); https://doi.org/10.1117/12.3045647
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Safety

Visualization

Feature extraction

Image enhancement

Neural networks

Mathematical optimization

Back to Top