Camouflaged targets are usually visually integrated with the background they are in, and it is difficult to detect and recognize camouflaged targets in the visible wavelength band. However, existing camouflage means are difficult to achieve full-band camouflage, and camouflage recognition can be performed by extracting depth features of the target spectrum. Hyperspectral images contain hundreds of continuous and narrow spectral bands that provide detailed spectral features about the target. Hyperspectral image target detection aims to accurately identify and localize targets of interest from complex backgrounds and is one of the key tasks in hyperspectral image processing. Traditional hyperspectral image target detection methods mainly focus on spectral characterization, but they are generally only able to process shallow information about the target spectrum, which is easily affected by complex environments, noise, and other factors. In recent years, deep learning theory has been widely applied in many fields. Deep learning methods can automatically model the data, can extract the deep information of the target, and show good performance in dealing with nonlinear problems. The traditional constrained energy minimization (CEM) method uses a function that is a linear function with a closed-form solution. However, the use of a linear function may suffer from insufficient separation of target and background in complexenvironments. In order to improve the deficiencies of the CEM algorithm and further enhance the detection efficiency of camouflaged targets in complex environments, a camouflaged target detection method with joint vision transformer (Vit) and constrained energy minimization is proposed. The proposed method combines the minimized energy constraint method with the vision transformer network architecture, and extends the linear filter function in CEM to a nonlinear filter function by building a nonlinear target detector. The proposed method adapts the vision transformer network to make it a powerful nonlinear filter function, and the combination of the two enables a nonlinear CEM approach. Validation is performed by setting up artifacts in urban badlands with different levels of complexity, and the results show that the Vit-CEM method can successfully detect artifacts in both urban backgrounds with lots of buildings and lots of grass, with the best detection results. When using the area under the ROC curve (AUC) as the evaluation index of the detection effect, the AUC of the Vit-CEM method is as high as 99.99.
|