Anomaly detection based on template image registration is one of the methods used to detect anomalies of objects with similar structures. However, the imaging device of the Electronic Multiple Units (EMU) train is a line-scan camera, and the geometric transformation law between its image and the area-scan image is different. The conventional image registration method of the area-scan image cannot accurately align the line-scan image. Moreover, the line-scan image of the EMU train collected in uncontrollable environmental scenes exhibits significant grayscale changes, and the registered images cannot detect differences by directly comparing grayscale. To address these two challenges, this paper proposes a two-stage anomaly detection method based on line-scan image registration and edge comparison. In the registration stage, a coarse-to-fine line-scan image registration method was designed. First, the feature-based registration method was used to achieve coarse position of the EMU train template image in the target image. Then, the direct registration method based on the line-scan image geometric transformation model achieved precise geometric alignment. In the anomaly detection stage, edge information is extracted from the template image and the registered target image, and anomaly detection of EMU trains is achieved through edge comparison, edge expansion, and edge connection. The experimental results on the line-scan image of EMU trains show that the two-stage method proposed in this paper can achieve the registration of line-scan images of EMU trains, and on this basis, achieve detection and segmentation of abnormal areas of EMU trains.
In this research, we introduce an innovative saliency detection algorithm, comprising three essential steps. Firstly, leveraging fully convolutional networks with aggregation interaction modules, we generate an initial saliency map. Secondly, we extract hand-craft and deep features to express the image, then use manifold ranking method to construct saliency maps. Ultimately, by integrating the outcomes from preceding stages, we generate the final saliency map. Experimental findings demonstrate that our method surpasses twelve cutting-edge saliency detection techniques in terms of precision, recall, F-measure, and MAE value metrics."
In the field of remote sensing image interpretation, utilizing Convolutional Neural Networks (CNNs) for building extraction is a highly significant task. End-to-end building extraction methods typically consist of two key components: the encoder and the decoder. However, during building extraction facilitated by the encoder, down-sampling operations often lead to a loss of boundary features in the segmented objects. Many of these lost features correspond to the boundaries of buildings, and the reduction of smaller-scale boundaries features diminishes the network's attention to building boundary, resulting in blurred architectural delineation. In this paper, we propose the Reshape Feature Distribution (RFD-Net) network to alleviate the problem of boundary blurriness. We embed a reshaping feature distribution module within the network, which manipulates the data distribution of feature values by compressing the maximum values and elevating the minimum values. This module can effectively increase the magnitude of feature values at positions corresponding to building boundaries in the feature maps, subsequently enhancing the network's attention to building boundaries and alleviating the problem of boundary blurriness. We conducted experiments on the WHU dataset, demonstrating the effectiveness of our proposed approach.
Detecting oil leakage faults in traction transformers is critical to Electrical Multiple Unit (EMU) operation fault detection. Mineral transformer oil, which exhibits fluorescence characteristics, is commonly employed in EMU traction transformers. The position and intensity of the fluorescence characteristic peaks determine the selection of excitation light source wavelength and imaging spectral band for the mineral transformer oil leakage detection technology based on fluorescence imaging. However, the fluorescence characteristics will change with ageing. To investigate the changes in the fluorescence characteristics of mineral transformer oil used in EMU during the ageing process, accelerated thermal ageing experiments were conducted in the laboratory. The three-dimensional fluorescence spectra of samples were collected regularly. Furthermore, the hydrocarbon group composition and interfacial tension of the samples were routinely detected, with the intent of unraveling the underlying factors contributing to the variations observed in the fluorescence spectra. The results show that the position of the fluorescence spectrum peak gradually shifts to the long-wave direction with the prolongation of the ageing time, and the intensity of the peak continuously decreases. The experimental results offer the experimental and data foundation for the selection of excitation light source wavelength and imaging spectral band in the application of fluorescence imaging detection technology to detect oil leakage faults in mineral transformers of EMU.
Geometrically aligning two images is a basic image process in the machine vision field. In general, the geometric transformation model is selected according to the complexity of the geometric transformation between images, and the type of camera that takes the image is ignored. However, there are essential differences in the imaging mechanism between a line-scan camera and an area array camera, so the geometric transformation model of the area array image does not conform to the geometric transformation of the line-scan image. Therefore, according to the imaging model of the line-scan camera, we established the geometric transformation model of line-scan images. A line-scan image acquisition system was built, and planar objects were imaged by different line-scan camera poses. Then, taking the proposed model as the geometric transformation model, the line-scan images collected in this paper and the line-scan images of realistic application were respectively registered. As a contrast experiment, the homography of the area array image was adopted as the geometric transformation model to align these images again. Through the comparison of registration results, the correctness of the proposed geometric transformation model of line-scan images was verified, and the accuracy of the line-scan image registration was improved.
The segmentation of the offshore farm area in the high-resolution SAR image is of great significance for the statistics of the farming area and the analysis of the rationality of the farming layout. However, the SAR images have the characteristics of a lot of noise and inconspicuous features. It is difficult to achieve precise segmentation by directly using non-learning image segmentation methods. Therefore, we propose a precise segmentation scheme for offshore farms in high-resolution SAR images based on improved UNet++. We first adopt a simulated annealing strategy for the update of the learning rate during the network training. By initializing the learning rate multiple times, we avoid the network from falling into a local optimum. Secondly, for the dataset studied, we verify that the segmentation performance of resizing the image to 256×256 pixels is better than that of 512×512 pixels. Finally, we propose an improved UNet++, which uses SE-Net as the feature extraction network to enhance the feature learning ability. Extensive experimental results show that, compared to some state-of-the-art methods, the proposed scheme achieves superior performance with a frequency weighted intersection over union (FWIoU) of 0.9853 on the high-resolution SAR offshore farm dataset.
After posterior lumbar surgeries (PLS), the change of the cross-sectional area(CSA) and fatty infiltration (FI) of paraspinal muscle can deeply affect the muscle activity pattern and spinal stability. The objective of this work is to perform automated paraspinal muscle (multifidus and erector spine) segmentation in magnetic resonance imaging (MRI) image. However, no work has achieved the semantic segmentation of multifidus (MF) and erector spinae (ES) due to three unusual challenges: (1) the distribution of paraspinal muscle overlaps with the distribution of other anatomical structures; (2) the fascia between MF and ES is unclear; (3) the intra- and inter-patient shape is variable. In this paper, we proposed a generative adversarial network called LPM-GAN which contains a generator and a discriminator to resolve above challenges. The generator solves the high variability and variety of paraspinal muscle through extracting high-level semantics of images and preserving the paraspinal muscle anatomy. And then, the discriminator is trained to optimize the predicted mask to make it closer to ground truth. Finally, we obtain the CSA and FI of paraspinal muscle by utilizing Otsu. Extensive experiments on MRIs of 69 patients have demonstrated that LPM-GAN achieves high Recall of 0.931 and 0.904, and Dice coefficient of 0.920 and 0.903, which reveals the method is effective.
In aquaculture, the normal growth of fish is closely related to the density of aquaculture. Therefore, it is of great significance to use remote sensing images to accurately segment the cages in a specific sea area at a macro level. This research proposes an accurate segmentation scheme for remote sensing cages based on U-Net and voting mechanism. Firstly, a remote sensing cage segmentation (RSCS) data set is produced, which includes fifty-three high-resolution cage images with inconsistent resolution. Secondly, by using random cropping and data enhancement operations on the training samples, three training sets with image block sizes of 256×256 pixels, 512×512 pixels, and 1024×1024 pixels are created. And through the introduction of U-Net network, three training sets of different sample sizes are trained separately and three trained models are generated. Then, after reasonably filling the test image, a window sliding overlap cropping method is adopted. The high-resolution remote sensing cage test images are sequentially cut into the image blocks for segmentation, and the segmented image blocks are spliced and combined into the binary segmentation image by the mean method. Finally, for each image, the three binary segmentation images generated by different trained models are used to vote for each pixel. The experimental results show that by testing three remote sensing images of Li'an Port, Xincun Port and Potou Port, the Mean Intersection over Union (mIoU) is 0.865. Our data and code can be available online.
It is well known that achieving a robust visual tracking task is quite difficult, since it is easily interfered by scale variation, illumination variation, background clutter, occlusion and so on. Nevertheless, the performance of spatio-temporal context algorithm is remarkable, because the spatial context information of target is effectively employed in this algorithm. However, the capabilities of discriminate target and adjust to scale variation need to promote in complex scene. Furthermore, due to lack of an appropriate target model update strategy, its tracking capability also deteriorates. In the interest of tackling these problems, a multi-scale spatio-temporal context visual tracking algorithm based on target model adaptive update is proposed. Firstly, the histogram of oriented gradient features are adopted to describe the target and its surrounding regions to improve its discriminate ability. Secondly, a multi-scale estimation method is applied to predict the target scale variation. Then, the peak and the average peak to correlation energy of confidence map response are combined to evaluate the visual tracking status. When the status is stable, the current target is expressed in a low rank form and a CUR filter is learned. On the contrary, the CUR filter will be triggered to recapture the target. Finally, the experimental results demonstrate that the robustness of this algorithm is promoted obviously, and its overall performance is better than comparison algorithms.
Aiming at the problems of target scale change, color similarity and occlusion during target tracking, this paper proposes a single target tracking algorithm based on the fusion feature of color feature (CN) and direction gradient histogram (HOG). Under the relevant filtering and tracking framework, the original RGB color space is mapped to the color attribute space to reduce the target color from being affected by environmental changes during the tracking process. The adaptive component dimensionality reduction through principal component analysis (PCA) method, features The number of channels drops from 10 to 2, and the cost of crossing different feature subspaces is increased by smoothing constraints. At the same time, the direction gradient histogram is extracted, and the feature map is calculated by kernel correlation filtering to obtain the correlation response map, and the maximum response value is found from the response map to determine the target position. 36 groups of color video sequences were selected on the OTB standard data set for experiments. The popular correlation filter tracking algorithm was compared. The experimental results show that the algorithm has high recognition accuracy and can be used in complex environments such as illumination changes, target occlusion and deformation. Stable tracking target.
Aerial imagery target detection has been widely used in the military and economic fields. However, it still faces a variety of challenges. In this paper, we proposed several efficiency improvements based on YOLO v3 framework for getting a better small target detection precision. Firstly, a dual self-attention (DAN) block is embedded in Darknet-53’s ResNet units to refine the feature map adaptively. Furthermore, the deep semantic features are cascaded with the shallow outline features in a feedforward deconvolutional module to obtain context details of small targets. Finally, introducing online hard examples mining and combining Focal Loss to enhance the discriminating ability between classes. The experimental results on the VEDAI aerial dataset show that the proposed algorithm is significantly improved in accuracy compared to the original network and achieves better performance than two-stage algorithms.
Selecting a reliability matching area as template is one of the key issues to vision navigation. This paper proposes a metric for matching area selection based on line feature extraction and connection. Firstly, a new line feature is introduced to approximate the reliability information about matching area, which is called saliency line feature. Then, extracting method of these line features is put forward based on monogenic phase congruency model. Secondly, a convex shape descriptor is proposed to represent the spatial distribution characteristic of the line features by connection. Finally, a measure method is defined by merging the quantity and spatial distribution characteristic of the saliency line features, which can guide to select better matching area. The experimental results show that the proposed metric is valid and effective.
In order to solve the problem of matching failure of BBS (Best-Buddies Similarity) algorithm when the target image has a partial occlusion, cluttered background, imbalance illumination, and nonrigid deformation. A multi-feature template matching algorithm based on the BBS algorithm is proposed in this paper. On the basis of the location features and appearance features, we add HOG (Histogram of Oriented Gradients) features to make full use of the color, position and structural contour of the target image to match. In addition, we also perform mean filtering on the confidence map. The experimental results show that the AUC (Area Under Curve) score of the proposed algorithm is 0.6119, which is 6.38% higher than the BBS algorithm. Moreover, our algorithm has stronger robustness and higher matching accuracy.
Correlation filter, previously used in object detection and recognition assignment within single image, has become a popular approach to visual tracking due to its high efficiency and robustness. Many trackers based on the correlation filter, including Minimum Output Sum of Squared Error (MOSSE), Circulant Structure tracker with Kernels (CSK) and Kernel Correlation Filter (KCF), they simply estimate the translation of a target and provide no insight into the scale variation of a target. But in visual tracking, scale variation is one of the most common challenges and it influences the visual tracking performance in stability and accuracy. Thus, it is necessary to handle the scale variation. In this paper, we present an accurate scale estimation solution with two steps based on the KCF framework in order to tackle the changing of target scale. Meanwhile, besides the original pixel grayscale feature, we integrate the powerful features Histogram of Gradient (HoG) and Color Names (CN) together to further boost the overall visual tracking performance. Finally, the experimental results demonstrate that the proposed method outperforms other state-of-the-art trackers.
Image sets and videos can be modeled as subspaces, which are actually points on Grassmann manifolds. Clustering of such visual data lying on Grassmann manifolds is a hard issue based on the fact that the state-of-the-art methods are only applied to vector space instead of non-Euclidean geometry. Although there exist some clustering methods for manifolds, the desirable method for clustering on Grassmann manifolds is lacking. We propose an algorithm termed as kernel sparse subspace clustering on the Grassmann manifold, which embeds the Grassmann manifold into a reproducing kernel Hilbert space by an appropriate Gaussian projection kernel. This kernel is applied to obtain kernel sparse representations of data on Grassmann manifolds utilizing the self-expressive property and exploiting the intrinsic Riemannian geometry within data. Although the Grassmann manifold is compact, the geodesic distances between Grassmann points are well measured by kernel sparse representations based on linear reconstruction. With the kernel sparse representations, clustering results of experiments on three prevalent public datasets outperform a number of existing algorithms and the robustness of our algorithm is demonstrated as well.
This paper presents a new method for wood defect detection. It can solve the over-segmentation problem existing in local threshold segmentation methods. This method effectively takes advantages of visual saliency and local threshold segmentation. Firstly, defect areas are coarsely located by using spectral residual method to calculate global visual saliency of them. Then, the threshold segmentation of maximum inter-class variance method is adopted for positioning and segmenting the wood surface defects precisely around the coarse located areas. Lastly, we use mathematical morphology to process the binary images after segmentation, which reduces the noise and small false objects. Experiments on test images of insect hole, dead knot and sound knot show that the method we proposed obtains ideal segmentation results and is superior to the existing segmentation methods based on edge detection, OSTU and threshold segmentation.
The key to the target detection of infrared sea ship is real-time, fast and efficient detection of the real goal. The BING method, which introduces the binarization approximation calculation, can quickly detect target. But for infrared images, the approximate calculation also brings some shortcomings. The approximation of the gradient feature makes the overall gradient of the image amplitude decrease and some of the smaller gradient edge details disappear, so that leading to the weak distinguished ability. Based on the BING algorithm, we propose an improved BING algorithm which can quickly extract the candidate regions of infrared ship images. In the normed gradients(NG) feature, we introduce the Laplace difference operator and use the two-level cascaded SVM to learn them. Experimental results have shown that our method is effective and rapid to extract the region of interest (ROI) of the target ship.
Recently, Kernel Correlation Filter (KCF) has achieved great attention in visual tracking filed, which provide excellent tracking performance and high possessing speed. However, how to handle the scale variation is still an open problem. In this paper, focusing on this issue that a method based on Gaussian scale space is proposed. First, we will use KCF to estimate the location of the target, the context region which includes the target and its surrounding background will be the image to be matched. In order to get the matching image of a Gaussian scale space, image with Gaussian kernel convolution can be gotten. After getting the Gaussian scale space of the image to be matched, then, according to it to estimate target image under different scales. Combine with the scale parameter of scale space, for each corresponding scale image performing bilinear interpolation operation to change the size to simulate target imaging at different scales. Finally, matching the template with different size of images with different scales, use Mean Absolute Difference (MAD) as the match criterion. After getting the optimal matching in the image with the template, we will get the best zoom ratio s, consequently estimate the target size. In the experiments, compare with CSK, KCF etc. demonstrate that the proposed method achieves high improvement in accuracy, is an efficient algorithm.
In this paper, we consider direct image registration problem which estimate the geometric and photometric transformations between two images. The efficient second-order minimization method (ESM) is based on a second-order Taylor series of image differences without computing the Hessian under brightness constancy assumption. This can be done due to the fact that the considered geometric transformations is Lie group and can be parameterized by its Lie algebra. In order to deal with lighting changes, we extend ESM to the compositional dual efficient second-order minimization method (CDESM). In our approach, the photometric transformations is parameterized by its Lie algebra with compositional operation, which is similar to that of geometric transformations. Our algorithm can give a second-order approximation of image differences with respect to geometric and photometric parameters. The geometric and photometric parameters are simultaneously obtained by non-linear least-square optimization. Our algorithm preserves the advantages of the original ESM method which has high convergence rate and large capture radius. Experimental results show that our algorithm is more robust to lighting changes and has higher registration accuracy compared to previous algorithms.
KEYWORDS: 3D displays, Signal to noise ratio, Detection and tracking algorithms, Target detection, Infrared detectors, Infrared radiation, Infrared imaging, Image processing, 3D acquisition, Optical engineering
Dim targets are extremely difficult to detect using methods based on single-frame detection. Radiation accumulation is one of the effective methods to improve signal-to-noise ratio (SNR). A detection approach based on radiation accumulation is proposed. First, a location space and a motion space are established. Radiation accumulation operation, controlled by vectors from the motion space, is applied to the original image space. Then, a new image space is acquired where some images have an improved SNR. Second, quasitargets in the new image space are obtained by constant false-alarm ratio judging, and location vectors and motion vectors of quasitargets are also acquired simultaneously. Third, the location vectors and motion vectors are mapped into the two spaces, respectively. Volume density function is defined in the motion space. Location extremum of the location space and volume density extremum of motion space will confirm the true target. Finally, actual location of the true target in the original image space is obtained by space inversion. The approach is also applicable to detect multiple dim targets. Experimental results show the effectiveness of the proposed approach and demonstrate the approach is superior to compared approaches on detection probability and false alarm probability.
This paper presents a new feature matching algorithm for nonrigid multimodal image registration. The proposed
algorithm first constructs phase congruency representations (PCR) of images to be registered. Then scale invariant
feature transform (SIFT) method is applied to capture significant feature points from PCR. Subsequently, the putative
matching is obtained by the nearest neighbour matching in the SIFT descriptor space. The SIFT descriptor is then
integrated into Coherent Point Drift (CPD) method so that the appropriate matching of two point sets is solved by
combining appearance with distance properties between putative match candidates. Finally, the transformation estimated
by matching the point sets is applied to registration of original images. The results show that the proposed algorithm
increases the correct rate of matching and is well suited for multi-modal image registration.
The traditional Hausdorff measure, which uses Euclidean distance metric (L2 norm) to define the distance between coordinates of any two points, has poor performance in the presence of the rotation and scale change although it is robust to the noise and occlusion. To address the problem, we define a novel similarity function including two parts in this paper. The first part is Hausdorff distance between shapes which is calculated by exploiting shape context that is rotation and scale invariant as the distance metric. The second part is the cost of matching between centroids. Unlike the traditional method, we use the centroid as reference point to obtain its shape context that embodies global information of the shape. Experiment results demonstrate that the function value between shapes is rotation and scale invariant and the matching accuracy of our algorithm is higher than that of previously proposed algorithm on the MEPG-7 database.
Conventional approaches to object tracking use area correlation, but they are difficult to solve the problem of
deformation of object region during tracking. A novel target tracking method based on Lie algebra is presented. We use
Gabor feature as target token, model deformation using affine Lie group, and optimize parameters directly on manifold,
which can be solved by exponential mapping between Lie Group and its Lie algebra. We analyze the essence of our
method and test the algorithm using real image sequences. The experimental results demonstrate that Lie algebra method
outperforms other traditional algorithms in efficiency, stabilization and accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.