Action recognition and scoring have important application value in health monitoring, sports analysis, and physical education. With the development of wearable devices and sensor technology, sensor sequences based action recognition and scoring have become a hot research topic. However, the complexity and individual differences of action sequences require recognition models to have high generalization. In addition, the key features in action sensor sequence data are often sparse. It is challenging to combine the global features and local details for collaborative analysis. To address these challenges, a novel action recognition and scoring method CA-TimeMixer based on contrastive learning and TimeMixer is proposed in this paper. In terms of model architecture, the Past Decomposable Mixing (PDM) of TimeMixer is employed for feature extraction. Cross Attention (CA) is introduced to fuse different scaled features of the sensor sequences, thereby enhancing the model’s ability to extract local-global features. A supervised contrastive loss is adopted to perform contrastive learning on sequence samples of different actions for action recognition. During inference, the feature similarity between the test sample and the benchmark action sequence is computed to determine whether the category of test sample is the same with the benchmark action. For the action scoring, the test sequence and the benchmark sequence are input into CA-TimeMixer. The difference tensor of these two features predicts a score with a linear layer. Comparative experiments and ablation studies were conducted on the collected Taichi action sequence dataset. Compared with the benchmark models, the proposed CA-TimeMixer achieves Accuracy of 94.7% in action recognition and Mean Absolute Error (MAE) of 0.36 in action scoring. Comprehensive experimental analysis demonstrates the superiority of the proposed method.
|