Unsupervised domain adaptation for object detection leverages a labeled domain to learn an object detector generalizing to a different domain free of annotations. We propose efficient multi-scale attention, confidence mixing, augmentation, and combination (ECAC), an adaptive object detector learning method based on a region-level confidence sample mixing strategy. Compared with the current methods, our approach crops high-confidence detection regions from both the source and target domains, augments them, and combines them to generate composite samples. In addition, consistency loss is utilized to solve the domain adaptation problem. Furthermore, we introduce the efficient multi-scale attention (EMA) into the detector. To retain channel information and reduce computational overhead, EMA attention restructures part of the channels into the batch dimension and groups the channel dimension into multiple sub-features, ensuring spatial semantic features are evenly distributed within each feature group. EMA employs a shared 1×1 convolution branch from the CA attention module, along with a parallel 3×3 convolution kernel to aggregate multi-scale spatial structure information. This approach effectively enhances the model’s focus on region-level features by integrating local and global information with multi-scale parallel sub-networks and cross-spatial learning. For pseudo-label filtering, we progressively transition from a loose to a stricter confidence threshold. Initially, this allows more pseudo-labels, facilitating the detector’s learning of target domain representations. As training progresses, stricter thresholds are applied to select more reliable pseudo-labels, gradually filtering out inaccurate pseudo-detections. Our extensive experiments on three datasets demonstrate that ECAC achieves state-of-the-art performance on two of them. On the third dataset, our method improves the mean average precision by over 2% compared with the latest methods.
To tackle the problem of human trajectory prediction in complex scenes, we propose a model using hypergraph convolutional neural networks for social interaction (HGCNSI). Our model leverages a hypergraph structure to capture both high-order interactions and complex social dynamics among pedestrians (who often influence each other in a nonlinear and structured manner). First, we propose a social interaction module that improves the accuracy of interaction modeling by distinguishing between interacting and non-interacting pedestrians. Then, the hypergraph structure that can capture the complex and nonlinear relations among multiple pedestrians from the social interaction module is constructed. Furthermore, we exploit an improved attention mechanism called scene-coordinates attention that fuses the spatial and temporal features and models the unique historical movement information of each person. Finally, we introduce the SIRF module that filters the trajectories within one iteration to reduce the computational complexity and improve the prediction performance. We evaluate the proposed HGCNSI model on five publicly available datasets and demonstrate that it achieves state-of-the-art results. Specifically, the experiments show that our model outperforms existing methods in terms of prediction accuracy, using evaluation metrics, such as the average displacement error and the final displacement error.
Betweenness centrality is a measure of node importance in networks, but conventional exact algorithms require a lot of time as the network size grows dramatically. This paper aims to enhance the efficiency and accuracy of betweenness centrality computation for network nodes. We propose an algorithm based on shortest path approximation and adaptive sampling. The algorithm first selects high-quality seed nodes according to degree, then approximates shortest paths, and finally chooses appropriate samples to approximate betweenness centrality. We conduct experiments on 5 different datasets, and the results show that our algorithm outperforms the baseline algorithms in terms of sample size and running time. Our algorithm not only reduces the computational cost effectively, but also guarantees the computational accuracy.
Deep clustering algorithms based on graph convolutional networks are widely used due to their strong ability to mine network structure. However, the construction of neighborhood graphs may introduce noise and affect the clustering results. Meanwhile, focusing on ordinary topology alone ignores the higher-order connections between data about attributes. To address the above problems, an unsupervised hypergraph convolutional clustering network (UHCCN) is proposed in this paper. We construct hypergraph structures using attributes and incorporate higher-order information encoding into representation learning through hypergraph convolution. Using an attribute encoder will extract node features and fuse it into the hypergraph convolution. Finally, representation learning and clustering are optimized jointly. The experiments validate the effectiveness and superiority of UHCCN.
Attention mechanism in image captioning model can help model focus on relative regions while generating caption. However, existing attention mechanisms are unable to identify important regions and important visual features in images. This problem makes models sometimes pay excessive attention to non-important regions and non-important features in the process of generating image captions, which makes model generate coarse-grained and even wrong image captions. To address this problem, we propose an “Importance Discrimination Attention” (IDA) module, which could discriminate important feature and non-important features and reduce the possibility of misleading by non-important features in the process of generating image captions. We also propose a IDA-based image captioning model IDANet, which is completely based on transformer framework. The encoder of IDANet consists of two parts, one is pretrained Vision Transformer (VIT), which is used to extract visual features in a fast way. The other is refining module which is added into encoder to model position and semantic relationships of different grids. For the decoder, we propose IDA-Decoder which has similar framework with transformer decoder. IDA-Decoder is guided by IDA to focus on crucial regions and features instead of all regions and features while generating image caption. Compared with others attention mechanism, IDA could capture semantic relevance of important regions with other regions in a fine-grained and high-efficient way. The caption generated by IDANet could accurately capture the relevance of different objects and discriminate objects that have similar size and shape. The performance on MSCOCO “Karpathy” offline test split achieves 132.0 CIDEr-D score and 40.3 BLEU-4 score.
To resolve the problem of occlusion of the depth information of x-ray images and the detection of small-scale contraband in the detection of contraband objects, an improved prohibited item detection network has been proposed based on YOLOX. First, a material-aware atrous convolution module (MACM) is added to the feature pyramid network to enhance the model’s multiscale fusion and extraction ability for material information in x-ray image. Second, a spatial pyramid split attention mechanism (SPSA) is proposed to fuse spatial and channel attention for different scale spatial information features. Finally, CutMix data augmentation strategy is adopted to improve the robustness of the model. The overall performance identification experiments were conducted on the publicly available OPIXray dataset. The average accuracy (mean average precision, mAP) of the method is 93.10%. Compared with the baseline model YOLOX, the mAP is improved by 3.25%. The experimental results show that our method achieves state-of-the-art detection accuracy compared with existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.