KEYWORDS: RGB color model, Pose estimation, Feature extraction, Visual process modeling, Data modeling, Human computer interaction, Virtual reality, Deep convolutional neural networks, Visualization, Performance modeling
Hand Pose Estimation (HPE) is essential in Computer Vision (CV) applications such as Human-Computer Interaction (HCI), playing a critical role in fields like Virtual Reality (VR), robotics, medicine, and more. HPE, on the other hand, is confronted with obstacles like variances in hand size and the agility of hand movements. While 3D HPE has improved a lot, their dependency on 2D key points has led to a greater focus on enhancing 2D HPE methods. Deep learning has made significant advancements in these methods, especially in models including Deep Convolutional Neural Networks (DCNN) and Convolutional Pose Machines (CPM). In this study, we proposed a lightweight CPM for accurate 2D HPE to minimize the complexity of the model. The approach utilizes a modified ConvNeXt, incorporating a Global Context Block (GCB) as a central component. This integration is key for understanding and extracting enhanced features effectively. Our developed method demonstrates a notable improvement in performance, achieving an average accuracy increase of 2.62% compared to the Optimized Convolutional Pose Machine (OCPM), a state-of-the-art (SOTA) lightweight model in 2D HPE.
KEYWORDS: Pose estimation, Education and training, Mathematical optimization, RGB color model, 3D modeling, Visual process modeling, Data modeling, Convolutional neural networks, Artificial intelligence, Machine learning
Human hands, an essential component of the human body, play a vital role in interacting with and sensing real-world objects and are a reliable medium in modern technology for developing human-computer-interaction (HCI). Human Hand Pose Estimation (HPE) is challenging for numerous Artificial Intelligence (AI) applications due to the strong self-occlusion of the hands, depth ambiguity, and agile movement. Implementation of vision-based hand pose estimation algorithms can give a breath of innovation of these AI applications to overcome the challenges. We proposed a framework called Cascaded Deep Graphical Convolutional Neural Network (DCGCN, where Deep Convolutional Neural Network (DCnet) is used for computing unary and pairwise potential functions. A graphical model inference module is used for cascading unary and pairwise potentials. Evaluating the generated results via subjective and objective analysis, our DCDCN outperforms the state-of-the-art models in terms of accuracy and computational cost.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.