We present a set of experiments with a video OCR system (VOCR) tailored for video information retrieval
and establish its importance in multimedia search in general and for some specific queries in particular. The
system, inspired by an existing work on text detection and recognition in images, has been developed using
techniques involving detailed analysis of video frames producing candidate text regions. The text regions are
then binarized and sent to a commercial OCR resulting in ASCII text, that is finally used to create search
indexes. The system is evaluated using the TRECVID data. We compare the system's performance from an
information retrieval perspective with another VOCR developed using multi-frame integration and empirically
demonstrate that deep analysis on individual video frames result in better video retrieval. We also evaluate
the effect of various textual sources on multimedia retrieval by combining the VOCR outputs with automatic
speech recognition (ASR) transcripts. For general search queries, the VOCR system coupled with ASR sources
outperforms the other system by a very large extent. For search queries that involve named entities, especially
people names, the VOCR system even outperforms speech transcripts, demonstrating that source selection for
particular query types is extremely essential.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.