Paper
11 September 2024 Detecting prohibit items in x-ray imagery with multimodal large language models
Author Affiliations +
Proceedings Volume 13253, Fourth International Conference on Signal Image Processing and Communication (ICSIPC 2024); 1325302 (2024) https://doi.org/10.1117/12.3041668
Event: 4th International Conference on Signal Image Processing and Communication, 2024, Xi'an, China
Abstract
Recent Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in vision-language tasks, such as image captioning and question answering. However, they lack the essential perception ability, namely object detection. In this work, we focus on detecting prohibited items and discuss the possibility of integrating multimodal LLMs into the detection process. Our method first performs image captioning on the x-ray prohibited item image, followed by creating instructions to prompt the multimodal LLMs to identify the prohibited item. Our approach leverages the contextual understanding and language processing strengths of MLLMs. While current methods in real-time object detection having high accuracy, they often require extensive training on large datasets specific to the prohibited items. In contrast, MLLMs can understand and generate detailed descriptions, which can be advantageous in scenarios where prohibited items may not be well-represented in training data or exhibit significant variability in appearance. Our results suggest that MLLMs can complement traditional methods by providing a more nuanced understanding of prohibited items through their ability to interpret and respond to complex queries, potentially improving detection rates in challenging environments.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Yanze Ma, Baohong Gao, Langchao Qiao, Wentao Feng, Hanning Zhu, and Jiaji Wu "Detecting prohibit items in x-ray imagery with multimodal large language models", Proc. SPIE 13253, Fourth International Conference on Signal Image Processing and Communication (ICSIPC 2024), 1325302 (11 September 2024); https://doi.org/10.1117/12.3041668
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Object detection

Visual process modeling

Education and training

X-rays

Visualization

X-ray imaging

Data modeling

Back to Top