Mobile eye-tracking provides the fairly unique opportunity to record and elucidate cognition in action. In our research,
we are searching for patterns in, and distinctions between, the visual-search performance of experts and novices in the
geo-sciences. Traveling to regions resultant from various geological processes as part of an introductory field studies
course in geology, we record the prima facie gaze patterns of experts and novices when they are asked to determine the
modes of geological activity that have formed the scene-view presented to them. Recording eye video and scene video
in natural settings generates complex imagery that requires advanced applications of computer vision research to generate
registrations and mappings between the views of separate observers. By developing such mappings, we could then place
many observers into a single mathematical space where we can spatio-temporally analyze inter- and intra-subject fixations,
saccades, and head motions. While working towards perfecting these mappings, we developed an updated experiment
setup that allowed us to statistically analyze intra-subject eye-movement events without the need for a common domain.
Through such analyses we are finding statistical differences between novices and experts in these visual-search tasks. In
the course of this research we have developed a unified, open-source, software framework for processing, visualization,
and interaction of mobile eye-tracking and high-resolution panoramic imagery.
The history of eye-movement research extends back at least to 1794, when Erasmus Darwin (Charles' grandfather)
published Zoonomia, including descriptions of eye movements due to self-motion. But research on eye movements was
restricted to the laboratory for 200 years, until Michael Land built the first wearable eyetracker at the University of
Sussex and published the seminal paper "Where we look when we steer" [1]. In the intervening centuries, we learned a
tremendous amount about the mechanics of the oculomotor system and how it responds to isolated stimuli, but virtually
nothing about how we actually use our eyes to explore, gather information, navigate, and communicate in the real world.
Inspired by Land's work, we have been working to extend knowledge in these areas by developing hardware, algorithms,
and software that have allowed researchers to ask questions about how we actually use vision in the real world. Central
to that effort are new methods for analyzing the volumes of data that come from the experiments made possible by the
new systems. We describe a number of recent experiments and SemantiCode, a new program that supports assisted
coding of eye-movement data collected in unrestricted environments.
Smooth pursuit eye movements align the retina with moving targets, ideally stabilizing the retinal image. At a steadystate,
eye movements typically reach an approximately constant velocity which depends on, and is usually lower than the
target velocity. Experiment 1 investigated the effect of target size and velocity on smooth pursuit induced by realistic
images (color photographs of an apple and flower subtending 2° and 17°, respectively), in comparison with a small dot
subtending a fraction of a degree. The extended stimuli were found to enhance smooth pursuit gain. Experiment 2
examined the absolute velocity limit of smooth pursuit elicited by the small dot and the effect of the extended targets on
the velocity limit. The eye velocity for tracking the dot was found to be saturated at about 63 deg/sec while the saturation
velocity occurred at higher velocities for the extended images. The difference in gain due to target size was significant
between dot and the two extended stimuli, while no statistical difference exists between an apple and flower stimuli of
wider angular extent. Detailed knowledge of the smooth pursuit eye movements is important for several areas of
electronic imaging, in particular, assessing perceived motion blur of displayed objects.
Viewing video on mobile devices is becoming increasingly common. The small field-of-view and the vibrations in
common commuting environments present challenges (hardware and software) for the imaging community. By
monitoring the vibration of the display, it could be possible to stabilize an image on the display by shifting a portion of a
large image with the display (a field-of-view expansion approach). However, the image should not be shifted exactly per
display motion because eye movements have a 'self-adjustment' ability to partially or completely compensate for
external motions that can make a perfect compensation appear to overshoot. In this work, accelerometers were used to
measure the motion of a range of vehicles, and observers' heads and hands as they rode in those vehicles to support the
development of display motion compensation algorithms.
LCD televisions have LC response times and hold-type data cycles that contribute to the appearance of blur when objects are in motion on the screen. New algorithms based on studies of the human visual system's sensitivity to motion are being developed to compensate for these artifacts. This paper describes a series of experiments that incorporate eyetracking in the psychophysical determination of spatio-velocity contrast sensitivity in order to build on the 2D spatiovelocity contrast sensitivity function (CSF) model first described by Kelly and later refined by Daly. We explore whether the velocity of the eye has an additional effect on sensitivity and whether the model can be used to predict sensitivity to more complex stimuli. There were a total of five experiments performed in this research. The first four experiments utilized Gabor patterns with three different spatial and temporal frequencies and were used to investigate and/or populate the 2D spatio-velocity CSF. The fifth experiment utilized a disembodied edge and was used to validate the model. All experiments used a two interval forced choice (2IFC) method of constant stimuli guided by a QUEST routine to determine thresholds. The results showed that sensitivity to motion was determined by the retinal velocity produced by the Gabor patterns regardless of the type of motion of the eye. Based on the results of these experiments the parameters for the spatio-velocity CSF model were optimized to our experimental conditions.
Image evaluation tasks are often conducted using paired comparisons or ranking. To elicit interval scales, both methods rely on Thurstone's Law of Comparative Judgment in which objects closer in psychological space are more often confused in preference comparisons by a putative discriminal random process. It is often debated whether paired comparisons and ranking yield the same interval scales. An experiment was conducted to assess scale production using paired comparisons and ranking. For this experiment a Pioneer Plasma Display and Apple Cinema Display were used for stimulus presentation. Observers performed rank order and paired comparisons tasks on both displays. For each of five scenes, six images were created by manipulating attributes such as lightness, chroma, and hue using six different settings. The intention was to simulate the variability from a set of digital cameras or scanners. Nineteen subjects, (5 females, 14 males) ranging from 19-51 years of age participated in this experiment. Using a paired comparison model and a ranking model, scales were estimated for each display and image combination yielding ten scale pairs, ostensibly measuring the same psychological scale. The Bradley-Terry model was used for the paired comparisons data and the Bradley-Terry-Mallows model was used for the ranking data. Each model was fit using maximum likelihood estimation and assessed using likelihood ratio tests. Approximate 95% confidence intervals were also constructed using likelihood ratios. Model fits for paired comparisons were satisfactory for all scales except those from two image/display pairs; the ranking model fit uniformly well on all data sets. Arguing from overlapping confidence intervals, we conclude that paired comparisons and ranking produce no conflicting decisions regarding ultimate ordering of treatment preferences, but paired comparisons yield greater precision at the expense of lack-of-fit.
Eye movements are an external manifestation of selective attention and can play an important role in indicating which attributes of a scene carry the most pertinent information. Models that predict gaze
distribution often define a local conspicuity value that relies on low-level image features to indicate the perceived salience of an image region. While such bottom-up models have some success in predicting fixation densities in simple 2D images, success with natural scenes requires an understanding of the goals
of the observer, including the perceived usefulness of an object in the context of an explicit or implicit task. In the present study, observers viewed natural images while their eye movements were recorded. Eye movement patterns revealed that subjects preferentially fixated objects relevant for potential actions
implied by the gist of the scene, rather than selecting targets based purely on image features. A proto-object map is constructed that is based on highly textured regions of the image that predict the location of potential objects. This map is used as a mask to inhibit the unimportant low-level features and enhance the
important features to constrain the regions of potential interest. The resulting importance map correlates well to subject fixations on natural-task images.
Eye movement behavior was investigated for image-quality and chromatic adaptation tasks. The first experiment examined the differences between paired comparison, rank order, and graphical rating tasks, and the second experiment examined the strategies adopted when subjects were asked to select or adjust achromatic regions in images. Results indicate that subjects spent about 4 seconds looking at images in the rank order task, 1.8 seconds per image in the paired comparison task, and 3.5 seconds per image in the graphical rating task. Fixation density maps from the three tasks correlated highly in four of the five images. Eye movements gravitated toward faces and semantic features, and introspective report was not always consistent with fixation density peaks. In adjusting a gray square in an image to appear achromatic, observers spent 95% of their time looking only at the patch. When subjects looked around (less than 5% of the time), they did so early. Foveations were directed to semantic features, not achromatic regions, indicating that people do not seek out near-neutral regions to verify that their patch appears achromatic relative to the scene. Observers also do not scan the image in order to adapt to the average chromaticity of the image. In selecting the most achromatic region in an image, viewers spent 60% of the time scanning the scene. Unlike the achromatic adjustment task, foveations were directed to near-neutral regions, showing behavior similar to a visual search task.
A wearable eye tracker was used to record photographers' eye movements while they took digital photographs of person, sculpture, and interior scenes. Eye movement sequences were also recorded as the participants selected and cropped their images on a computer. Preliminary analysis revealed that during image capture people spend approximately the same amount of time looking at the camera regardless of the scene being photographed. The time spent looking at either the primary object or the surround differed significantly across the three scenes. Results from the editing phase support previous reports that observers fixate on semantic-rich regions in the image, which, in this task, were important in the final cropping decision. However, the spread of fixations, edit time, and number of crop windows did not differ significantly across the three image classes. This suggests that, unlike image capture, the cropping task was highly regular and less influenced by image content.
We explore the way in which people look at images of different semantic categories and directly relate those results to computational approaches for automatic image classification. Our hypothesis is that the eye movements of human observers differ for images of different semantic categories, and that this information can be effectively used in automatic content-based classifiers. First, we present eye tracking experiments that show the variation in eye movements across different individuals for image of 5 different categories: handshakes, crowd, landscapes, main object in uncluttered background, and miscellaneous. The eye tracking results suggest that similar viewing patterns occur when different subjects view different images in the same semantic category. Using these results, we examine how empirical data obtained from eye tracking experiments across different semantic categories can be integrated with existing computational frameworks, or used to construct new ones. In particular, we examine the Visual Apprentice, a system in which images classifiers are learned form user input as the user defines a multiple level object definition hierarchy based on an object and its parts and labels examples for specific classes. The resulting classifiers are applied to automatically classify new images. Although many eye tracking experiments have been performed, to our knowledge, this is the first study that specifically compares eye movements across categories, and that links category-specific eye tracking results to automatic image classification techniques.
Visual perception, operating below conscious awareness, effortlessly provides the experience of a rich representation of the environment, continuous in space and time. Conscious visual perception is made possible by the 'foveal compromise,' the combination of the high-acuity fovea and a sophisticated suite of eye movements. Our illusory visual experience cannot be understood by introspection, but monitoring eye movements lets us probe the processes of visual perception. Four tasks representing a wide range of complexity were used to explore visual perception; image quality judgments, map reading, model building, and hand-washing. Very short fixation durations were observed in all tasks, some as short as 33 msec. While some tasks showed little variation in eye movement metrics, differences in eye movement patterns and high-level strategies were observed in the model building and hand washing tasks. Performance in the hand washing task revealed a new type of eye movement. 'Planful' eye movements were made to objects well in advance of a subject's interaction with the object. Often occurring in the middle of another task, they provide 'overlapping' temporal information about the environment providing a mechanism to produce our conscious visual experience.
The study of human perception has evolved from examining simple tasks executed in reduced laboratory conditions to the examination of complex, real-world behaviors. Virtual environments represent the next evolutionary step by allowing full stimulus control and repeatability for human subjects, and a testbed for evaluating models of human behavior.
Morphological granulometries are generated by successively opening a thresholded image by an increasing sequence of structuring elements. The result is a sequence of images, each of which is a subimage of the previous. By counting the number of
pixels at each stage of the granulometry, a size distribution is generated that can be employed as a signature of the image. Normalization of the size distribution produces a probability distribution in
the usual sense. An adaptation of the method that is appropriate to texture-based segmentation is described. Rather than construct a single size distribution based on the entire image, local size distributions are computed over windows within the image. These local size distributions lead to granulometric moments at pixels within the image, and if the image happens to be partitioned into regions of
various texture, the local moments will tend to be homogeneous over any given region. Segmentation results from segmenting images whose gray values are local moments. Especially useful are the means of the local size distributions. Goodness of segmentation
depends on the local probability distributions of the granulometricmoment images. Both exact and asymptotic characterizations of these distributions are developed for the mean image of a basic convexity model.
Size distributions of particles within a binary image can be generated by morphological filtering processes known as granulometries. Granulometries filter the image by structuring elements of ever-increasing size, the result being a distribution whose statistics carry information regarding the shape and size of particles within the image. A granulometric approach to the analysis of the microstructure of electrophotographic images is discussed. The method is applied to both simulated and real images, the former being generated in a manner consistent with existing magnetic brush development and optical density transform models. Size distribution statistics are analyzed in terms of feedback control and copier quality control.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.