The general problem of single-view recognition is central to many image understanding and computer vision tasks. In previous work, we have shown how to approach the general problem of recognizing three dimensional geometric configurations from a single two dimensional view. Our methods make use of techniques from algebraic geometry, notably the theory of correspondences, and a novel "equivariant" geometric invariant theory. The machinery gives us a way to understand the relationship that exists between the 3D geometry and its "residual" in a 2D image. Exploiting this, one can compute a set of fundamental equations in the 3D and 2D invariants, which generate the ideal of the correspondence, and which completely describe the mutual 3D/2D constraints. We have chosen to call these equations "object/image equations". They can be used in a number of ways. For example, from a given 2D configuration, we can determine a set of non-linear constraints on the geometric invariants of 3D configurations capable of producing the given 2D configuration, and thus arrive at a test for determining the object being viewed. Conversely, given a 3D geometric configuration (features on an object), we can derive a set of equations that constrain the images of that object.
One difficulty has been that the usual numerical invariants get expressed as rational functions of the geometric parameters. As such they are not always defined. Moreover their definition tends to rely on special position assumptions that treat the features asymmetrically. This leads to degeneracies and numerical difficulties in algorithms based on these invariants. We show how to replace these invariants by certain toric subvarieties of Grassmannians where the object/image equations become "resultant-like" expressions for the existence of a non-trivial intersection of these subvarieties with certain Schubert varieties in the Grassmannian. We also explain how to obtain a shape space by making use of the Chow coordinates of these varieties. We call this approach the "global invariant" approach. It greatly increases the robustness and numerical stability of the methods.
Finally we will show how the recognition problem for point configurations in full perspective can be decoupled into a linear and non-linear part, and we will use this decoupling to give the most general solution possible to this recognition problem. We also will consider various natural metrics on our analog of the shape space for this problem and show how they can be derived from certain natural metrics on the Grassmannians, namely the Fubini-Study metrics.
|