Classifying the identities of people appearing in broadcast news video into anchor, reporter, or news subject is an important topic in high-level video analysis, which remains as a missing piece in the existing research. Given the visual resemblance of different types of people, this work explores multi-modal features derived from a variety of evidences, including the speech identity, transcript clues, temporal video structure, named entities, and face information. A Support Vector Machine (SVM) model is trained on manually-classified people to combine the multitude of features to predict the types of people who are giving monologue-style speeches in news videos. Experiments conducted on ABC World News Tonight video have demonstrated that this approach can achieve over 93% accuracy on classifying person types. The contributions of different categories of features have been compared, which shows that the relatively understudied features such as speech identities and video temporal structure are very effective in this task.
In the context of multimedia retrieval, the goal of accuracy is to a certain extent contradictory with that of efficiency. The former relies on exploiting sophisticated features, whereas the latter favors using simple features with reduced dimensionality. As an endeavor to strike the balance between these two goals, this paper presents a self-adaptive semantic schema mechanism (SSM) for multimedia databases. The SSM is implemented based on an object-oriented data model, with classes being organized into a semantic hierarchy. As its most distinguishable feature, when the conditions of certain ECA-rules are satisfied, SSM supports adaptive evolution of a schema in the form of expansion with new classes and/or compaction by removing inefficient ones. This self-adaptive evolution strategy allows a schema to optimize for the requirements of each specific application, thereby achieving a dynamic, application-specific balance between accuracy and efficiency. A prototype system for multimedia retrieval, 2M2Net, has been built based on this mechanism and validated for its feasibility.
Conference Committee Involvement (5)
Mobile Devices and Multimedia: Enabling Technologies, Algorithms, and Applications 2015
10 February 2015 | San Francisco, California, United States
Mobile Devices and Multimedia: Enabling Technologies, Algorithms, and Applications 2014
3 February 2014 | San Francisco, California, United States
Multimedia Content Access: Algorithms and Systems VII
4 February 2013 | Burlingame, California, United States
Multimedia Content Access: Algorithms and Systems VI
23 January 2012 | Burlingame, California, United States
Multimedia Content Access: Algorithms and Systems V
25 January 2011 | San Francisco Airport, California, United States
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.