Paper
1 July 2002 Unsupervised real-time speaker identification for daily movies
Author Affiliations +
Proceedings Volume 4862, Internet Multimedia Management Systems III; (2002) https://doi.org/10.1117/12.473031
Event: ITCom 2002: The Convergence of Information Technologies and Communications, 2002, Boston, MA, United States
Abstract
The problem of identifying speakers for movie content analysis is addressed in this paper. While most previous work on speaker identification was carried out in a supervised mode using pure audio data, more robust results can be obtained in real-time by integrating knowledge from multiple media sources in an unsupervised mode. In this work, both audio and visual cues will be employed and subsequently combined in a probabilistic framework to identify speakers. Particularly, audio information is used to identify speakers with a maximum likelihood (ML)-based approach while visual information is adopted to distinguish speakers by detecting and recognizing their talking faces based on face detection/recognition and mouth tracking techniques. Moreover, to accommodate for speakers' acoustic variations along time, we update their models on the fly by adapting to their newly contributed speech data. Encouraging results have been achieved through extensive experiments, which shows a promising future of the proposed audiovisual-based unsupervised speaker identification system.
© (2002) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ying Li and C.-C. Jay Kuo "Unsupervised real-time speaker identification for daily movies", Proc. SPIE 4862, Internet Multimedia Management Systems III, (1 July 2002); https://doi.org/10.1117/12.473031
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Mouth

Facial recognition systems

Data modeling

Video

Visualization

System identification

Expectation maximization algorithms

RELATED CONTENT

Recent developments in automated lip-reading
Proceedings of SPIE (October 16 2013)
Video face swap with DeepFaceLab
Proceedings of SPIE (March 18 2022)
Tracking face poses toward meeting video analysis
Proceedings of SPIE (November 15 2007)

Back to Top