Describing a useful performance evaluation method for object tracking algorithms is difficult. Algorithms that are very successful w.r.t. general-purpose performance metrics may perform poorly for a specific scenario. Additionally, algorithm developers frequently face an unanswerable question: will it satisfy the needs of that system (which is currently in the design phase)?”. Even when special time and resources can be allocated to collect reasonably representative data for the scenarios of interest, the answer usually remains ambiguous. Many times, during field tests or usage, the user experiences insufficient performance and the algorithm needs to be revised. In this study, we propose an approach to address this problem. Our approach is based on iterative improvement of the evaluation process. The performance requirements are determined by the field experts or the system designers. Standard questions are asked to the user/system developer and the test dataset is determined in cooperation. Each video segment in the dataset is assigned several tags for scenario type, difficulty and importance. For any novel failure case, representative videos are added to the dataset. This way, quantitative results can be organized to be more informative for the user and improvements to the algorithms can be evaluated more systematically.
Performance metrics that are used by the academia to evaluate the performance of video object detection algorithms are usually not informative for assessing whether or not the algorithms of interest are suitable (mature enough) for real-world deployment. We propose an approach to define the performance metrics that are suitable for various operational scenarios. In particular, we define four operational modes: Surveillance (alarm), situational awareness, detection and tracking. We then describe the performance metrics for the needs of each operational scenario, and explain the underlying reasoning. The metrics are compatible with the common practices for constructing in-house video datasets. We believe that these metrics provide useful insight for the usability of the algorithms. We also demonstrate that an algorithm which (at first glance) seem to have insufficient performance for deployment can be used in a real-world system (with simple post-processing) if its parameters are configured to provide high performance scores for scenario-specific metrics. We also show that the same underlying algorithm can be used for different operational scenarios if its parameters (and/or post-processing steps) are adjusted to meet the criteria based on relevant scenario-based performance metrics.
Data annotation is a time-consuming, labor-intensive step in supervised learning, mainly for detection and classification. Most of the time, human effort for annotation is required to obtain an accurately labeled dataset, which is time-consuming and sometimes impossible, especially for large datasets. Most of the novel methods use various networks to annotate the data. However, numerous hand-labeled data are still required for those methods. In order to solve this problem, we propose a method to make the process as human-independent as possible while preserving the annotation performance. The proposed method is applicable to datasets, for which the majority of the frames/images contain a single object (or a known number, ”n”, of objects). The method starts with an initial annotation network that is trained with a small amount of labeled data, %10 of the total training set, and then it continues iteratively. We use the annotation network to select the subset of the training set that is to be hand-labeled for the next iteration. This way, examples that are more likely to improve the annotation network can be selected. The total number of necessary hand-labeled images is dependent on the specific problem. We observed that when the proposed approach was used rather than annotating all the images, manually annotating approximately %25 of the dataset was sufficient. This percentage can vary according to the complexity and the type of the annotation network, as well as the dataset content. Our method can be used with existing (semi) automatic annotation tools.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.