Currently, the correlation filter is widely used in visual tracking because of its effectiveness and efficiency. To adapt the representation to changing target appearances, a linear interpolation is used to update tracking models according to a manually designed learning rate. However, The limitation of manually tricks make methods only apply to some special scenes because the threshold parameters are sensitive to different response maps in complex scenes. In this paper, to overcome this problem, an adaptive increment correlation filter based tracker is proposed. Different from traditional linear interpolation depending on a manual learning rate, the increment is learned by linear regression based on the history tracking model and the current training samples. Experimentally, we show that our algorithm can outperform state-of-the-art key point-based trackers.
KEYWORDS: Video, RGB color model, Video surveillance, Video acceleration, Feature extraction, 3D video streaming, Video compression, Cameras, Principal component analysis, Motion models
Different video modal for human action recognition has becoming a highly promising trend in the video analysis. In this paper, we propose a method for human action recognition from RGB video to Depth video using domain adaptation, where we use learned feature from RGB videos to do action recognition for depth videos. More specifically, we make three steps for solving this problem in this paper. First, different from image, video is more complex as it has both spatial and temporal information, in order to better encode this information, dynamic image method is used to represent each RGB or Depth video to one image, based on this, most methods for extracting feature in image can be used in video. Secondly, as video can be represented as image, so standard CNN model can be used for training and testing for videos, beside, CNN model can be also used for feature extracting as its powerful feature expressing ability. Thirdly, as RGB videos and Depth videos are belong to two different domains, in order to make two different feature domains has more similarity, domain adaptation is firstly used for solving this problem between RGB and Depth video, based on this, the learned feature from RGB video model can be directly used for Depth video classification. We evaluate the proposed method on one complex RGB-D action dataset (NTU RGB-D), and our method can have more than 2% accuracy improvement using domain adaptation from RGB to Depth action recognition.
Thanks to the availability of low cost depth cameras, like Microsoft Kinect, 3D hand pose estimation attracted special research attention in these years. Due to the large variations in hand`s viewpoint and the high dimension of hand motion, 3D hand pose estimation is still challenging. In this paper we propose a two-stage framework which joint with CNN and Random Forest to boost the performance of hand pose estimation. First, we use a standard Convolutional Neural Network (CNN) to regress the hand joints` locations. Second, using a Random Forest to refine the joints from the first stage. In the second stage, we propose a pyramid feature which merges the information flow of the CNN. Specifically, we get the rough joints` location from first stage, then rotate the convolutional feature maps (and image). After this, for each joint, we map its location to each feature map (and image) firstly, then crop features at each feature map (and image) around its location, put extracted features to Random Forest to refine at last. Experimentally, we evaluate our proposed method on ICVL dataset and get the mean error about 11mm, our method is also real-time on a desktop.
We are motived by the need for generic object detection algorithm which achieves high recall for small targets in complex scenes with acceptable computational efficiency. We propose a novel object detection algorithm, which has high localization quality with acceptable computational cost. Firstly, we obtain the objectness map as in BING[1] and use NMS to get the top N points. Then, k-means algorithm is used to cluster them into K classes according to their location. We set the center points of the K classes as seed points. For each seed point, an object potential region is extracted. Finally, a fast salient object detection algorithm[2] is applied to the object potential regions to highlight objectlike pixels, and a series of efficient post-processing operations are proposed to locate the targets. Our method runs at 5 FPS on 1000*1000 images, and significantly outperforms previous methods on small targets in cluttered background.
Object detection and tracking are critical parts of unmanned surface vehicles(USV) to achieve automatic obstacle avoidance. Off-the-shelf object detection methods have achieved impressive accuracy in public datasets, though they still meet bottlenecks in practice, such as high time consumption and low detection quality. In this paper, we propose a novel system for USV, which is able to locate the object more accurately while being fast and stable simultaneously. Firstly, we employ Faster R-CNN to acquire several initial raw bounding boxes. Secondly, the image is segmented to a few superpixels. For each initial box, the superpixels inside will be grouped into a whole according to a combination strategy, and a new box is thereafter generated as the circumscribed bounding box of the final superpixel. Thirdly, we utilize KCF to track these objects after several frames, Faster-RCNN is again used to re-detect objects inside tracked boxes to prevent tracking failure as well as remove empty boxes. Finally, we utilize Faster R-CNN to detect objects in the next image, and refine object boxes by repeating the second module of our system. The experimental results demonstrate that our system is fast, robust and accurate, which can be applied to USV in practice.
Current keypoint-based trackers are widely used in object tracking system because of their robust capability against scale, rotation and so on. However, when these methods are applied in tracking 3D target in a forward-looking image sequences, the tracked point usually shifts away from the correct position as time increases. In this paper, to overcome the tracked point drifting, structured output tracking is used to track the target point with its surrounding information based on Haar-like features. First, around the tracked point in the last frame, a local patch is cropped to extract Haar-like features. Second, using a structured output SVM framework, a prediction function is learned in a larger radius to directly estimate the patch transformation between frames. Finally, during tracking the prediction function is applied to search the best location in a new frame. In order to achieve the robust tracking in real time, keypoint matching is adopted to coarsely locate the searched field in the whole image before using the structured output tracking. Experimentally, we show that our algorithm is able to outperform state-of-the-art keypoint-based trackers.
Object detection is one of the most important researches in computer vision. Recently, category-independent objectness in RGB images has been a hot field for its generalization ability and efficiency as a pre-filtering procedure of the object detection. Many traditional applications have been transferred from the RGB images to the depth images since the economical depth sensors, such as Kinect, were popularized. The depth data represents the distance information. Because of the special characteristic, the methods of objectness evaluation in RGB images are often invalid in depth images. In this study, we propose mEdgeboxes to evaluate the objectness in depth image. Aside from detecting the edge from the raw depth information, we extract another edge map from the orientation information based on the normal vector. Two kinds of the edge map are integrated and are fed to Edgeboxes1 in order to produce the object proposals. The experimental results on two challenging datasets demonstrate that the detection rate of the proposed objectness estimation method can achieve over 90% with 1000 windows. It is worth noting that our approach generally outperforms the state-of-the-art methods on the detection rate.
The detection of shadow is the first step to reduce the imaging effect that is caused by the interactions of the light
source with surfaces, and then shadow removal can recover the vein information from the dark region. In this paper, we
have presented a new method to detect the shadow in a single nature image with the saliency map and to remove the
shadow. Firstly, RGB image is transferred to 2D module in order to improve the blue component. Secondly, saliency
map of blue component is extracted via graph-based manifold ranking. Then the edge of the shadow can be detected in
order to recover the transitional region between the shadow and non-shadow region. Finally, shadow is compensated by
enhancing the image in RGB space. Experimental results show the effectiveness of the proposed method.
In planetary or lunar landing missions, hazard avoidance is critical for landing safety. Therefore, it is very important to correctly detect hazards and effectively find a safe landing area during the last stage of descent.
In this paper, we propose a passive sensing based HDA (hazard detection and avoidance) approach via descent images to lower the landing risk. In hazard detection stage, a statistical probability model on the basis of the hazard similarity is adopted to evaluate the image and detect hazardous areas, so that a binary hazard image can be generated. Afterwards, a safety coefficient, which jointly utilized the proportion of hazards in the local region and the inside hazard distribution, is proposed to find potential regions with less hazards in the binary hazard image. By using the safety coefficient in a coarse-to-fine procedure and combining it with the local ISD (intensity standard deviation) measure, the safe landing area is determined. The algorithm is evaluated and verified with many simulated descent downward looking images rendered from lunar orbital satellite images.
This paper present an effective method for ship detection using optical flow and saliency methods from optical satellite images, which can be able to identify more than one ship targets in the complex dynamic sea background and succeeds to reduce the false positive rate compared to traditional methods. In this paper, moving targets in the image are highlighted through the classical optical flow method, and the dynamic waves are restrained by combining the state-of-art saliency method. We make the best of the low-level (size, color, etc.) and high-level (adjacent frames information, etc.) features of image, which can adapt to different dynamic background situation. Compared to existing method, experimental results demonstrate the robustness of the proposed method with high performance.
In this paper, a novel difficult prediction scheme for infrared building target recognition is developed. Our scheme can predict the difficulty of recognizing a designated target in advance, which is desirable in infrared building recognition. The experiment results show that our scheme is efficient to fulfill the prediction task and the prediction is consistent with the real recognition results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.