Pose estimation relating two-dimensional (2D) images to three-dimensional (3D) rigid object need
some known features to track. In practice, there are many algorithms which perform this task in high accuracy, but
all of these algorithms suffer from features lost. This paper investigated the pose estimation when numbers of
known features or even all of them were invisible. Firstly, known features were tracked to calculate pose in the
current and the next image. Secondly, some unknown but good features to track were automatically detected in the
current and the next image. Thirdly, those unknown features which were on the rigid and could match each other
in the two images were retained. Because of the motion characteristic of the rigid object, the 3D information of
those unknown features on the rigid could be solved by the rigid object's pose at the two moment and their 2D
information in the two images except only two case: the first one was that both camera and object have no relative
motion and camera parameter such as focus length, principle point, and etc. have no change at the two moment;
the second one was that there was no shared scene or no matched feature in the two image. Finally, because those
unknown features at the first time were known now, pose estimation could go on in the followed images in spite
of the missing of known features in the beginning by repeating the process mentioned above. The robustness of
pose estimation by different features detection algorithms such as Kanade-Lucas-Tomasi (KLT) feature, Scale
Invariant Feature Transform (SIFT) and Speed Up Robust Feature (SURF) were compared and the compact of the
different relative motion between camera and the rigid object were discussed in this paper. Graphic Processing
Unit (GPU) parallel computing was also used to extract and to match hundreds of features for real time pose
estimation which was hard to work on Central Processing Unit (CPU). Compared with other pose estimation
methods, this new method can estimate pose between camera and object when part even all known features are
lost, and has a quick response time benefit from GPU parallel computing. The method present here can be used
widely in vision-guide techniques to strengthen its intelligence and generalization, which can also play an important role in autonomous navigation and positioning, robots fields at unknown environment. The results of simulation and experiments demonstrate that proposed method could suppress noise effectively, extracted features robustly, and achieve the real time need. Theory analysis and experiment shows the method is reasonable and efficient.
|