Video understanding with image, audio, and text

Weiwei Wen; Lingzhi Liao

doi:10.1117/12.3049519

8 November 2024 Video understanding with image, audio, and text

Weiwei Wen, Lingzhi Liao

Author Affiliations +

Proceedings Volume 13416, Fourth International Conference on Advanced Algorithms and Neural Networks (AANN 2024); 134162Q (2024) https://doi.org/10.1117/12.3049519
Event: 2024 4th International Conference on Advanced Algorithms and Neural Networks, 2024, Qingdao, China

Abstract

With the explosive growth of streaming video data, describing or understanding these videos has become an interesting topic within the international academic community. However, existing methods have ignored the important information among the image, audio, and text, resulting in insufficient understanding of the video. In this paper, we propose a novel video understanding algorithm that incorporates the above neglected information. Firstly, this method combines speech recognition and a Large Language Model(LLM) to obtain the detailed textual descriptions of the video. Secondly, the image and textual descriptions are combined to obtain video keyframes. Finally, the textual descriptions and keyframes are concatenated to gain pivotal video understanding results. Extensive experiments have shown the superiority of the proposed method.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Weiwei Wen and Lingzhi Liao "Video understanding with image, audio, and text", Proc. SPIE 13416, Fourth International Conference on Advanced Algorithms and Neural Networks (AANN 2024), 134162Q (8 November 2024); https://doi.org/10.1117/12.3049519

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

;

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE