In the realm of video understanding and analysis, solely relying on the appearance features of individuals in video frames significantly falls short of enhancing the accuracy of group activity recognition. The comprehensive utilization of various feature information present is deemed crucial, playing a pivotal role in understanding group activities. Consequently, a three-stream architecture model for feature learning is proposed. This model not only considers the human appearance features and available scene-level context information for group activity recognition within videos but also emphasizes the model’s perception of individual motion, uncovering valuable information about motion features. Integrating appearance, motion, and scene-level context information affords a more comprehensive and rich representation of individual features. Ultimately, these combined features are employed in relation analysis to better predict group activities. The effectiveness of the proposed method is validated on two benchmark datasets, volleyball and collective activities, demonstrating its efficacy for the task. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
RGB color model
Motion models
Data modeling
Feature extraction
Video
Action recognition
Visualization