PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Evidence is presented that colar constancy does not exist as a special phenomenon of human color vision. It is argued that results of experiments, as well as casual observations, which seem to illustrate color constancy, can be easily understood from basic facts about chromatic adaptation and simultaneous contrast. The argument is supported by (i) a critique of the famous Mondrian studies, (ii) the ATD model's predictions of the Mondrian data, and (iii) a summary of a demonstration experiment regarding Mondrian patterns. Concerning definitions of color concepts, it is noted that, in any field of science, definitions change according to theoretical advances, but the vocabulary of color vision has not. In particular, the ATD models of the past ten years or so suggest that some of the universally accepted and seemingly essential terms of color require re-examination.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Capturing and rendering an image that fulfills the observer's expectations is a difficult task. This is due to the fact that the signal reaching the eye is processed by a complex mechanism before forming a percept, whereas a capturing device only retains the physical value of light intensities. It is especially difficult to render complex scenes with highly varying luminances. For example, a picture taken inside a room where objects are visible through the windows will not be rendered correctly by a global technique. Either details in the dim room will be hidden in shadow or the objects viewed through the window will be too bright. The image has to be treated locally to resemble more closely to what the observer remembers. The purpose of this work is to develop a technique for rendering images based on human local adaptation. We take inspiration from a model of color vision called Retinex. This model determines the perceived color given spatial relationships of the captured signals. Retinex has been used as a computational model for image rendering. In this article, we propose a new solution inspired by Retinex that is based on a single filter applied to the luminance channel. All parameters are image-dependent so that the process requires no parameter tuning. That makes the method more flexible than other existing ones. The presented results show that our method
suitably enhances high dynamic range images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present a tone mapping operator (TMO) for High Dynamic Range images, inspired by human visual system adaptive mechanisms. The proposed TMO is able to perform color constancy without a priori information about the scene. This is a consequence of its HVS inspiration. In our humble opinion, color constancy is very useful in TMO since we assume that it is preferable to look at an image that reproduces the color sensation rather than an image that follows classic photographic reproduction. Our proposal starts from the analysis of Retinex and ACE algorithms. Then we have extended ACE to HDR images, introducing novel features. These
are two non-linear controls: the first control allows the model to find a good trade-off between visibility and color distribution modifying the local operator at each pixel-to-pixel comparison while the second modifies the interaction between pixels estimating the local contrast. Solution towards unsupervised parameters tuning are
proposed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The spectral integrator at the University of Oslo consists of a lamp whose light is dispersed into a spectrum by means of a prism. Using a transmissive LCD panel controlled by a computer, certain fractions of the light in different parts of the spectrum is masked out. The remaining spectrum is integrated and the resulting colored light projected onto a dispersing plate. Attached to the computer is also a spectroradiometer measuring the projected light, thus making the spectral integrator a closed-loop system. One main challenge is the generation of stimuli of arbitrary spectral power distributions. We have solved this by means of a computational calibration routine: Vertical lines of pixels within the spectral window of the LCD panel are opened successively and the resulting spectral power distribution on the dispersing plate is measured. A similar procedure for the horizontal lines gives, under certain assumptions, the contribution from each opened pixel. Hereby, light of any spectral power distribution can be generated by means of a fast iterative heuristic search algorithm. The apparatus is convenient for research within the fields of color vision, color appearance modelling, multispectral color imaging, and spectral characterization of devices ranging from digital cameras to solar cell panels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose content adaptation for visual impairments in MPEG-21. The proposed content adaptation aims
to give enhanced visual accessibility to users with visual impairment in MPEG-21. In this paper, we consider two major
visual impairments: low vision impairment and color vision deficiency. The proposed method includes description for
the visual impairments and content adaptation technique based on it. We have developed a symptom-based description
of visual impairment characteristics for users with visual impairment in the context of MPEG-21 digital item adaptation
(DIA). To verify usefulness of the proposed method, we performed some experiments with the content adaptation based
on the description in MPEG-21. The experiment results showed that the proposed method is effective content adaptation
for user with visual impairment and gives enhanced visual accessibility to them.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Virtual Reality has great potential to become a usable design tool for the planning of light and colour in buildings. The technical development has provided us with better computer graphics and faster rendering techniques. However, the reliability and usability is delimited by lack of knowledge about how humans perceive spatial colour phenomena. The setting up of parameters for material properties in light calculation software is done arbitrarily. We present a comparison between a real room and a digital model evaluated on a desktop PC and in an Immersive Projection Technology (IPT) type system. Data were collected from video recorded interviews and questionnaires. The participants assessed the appearance of light, colours and space. They also evaluated their involvement in solving this task, and their presence in each environment. Our results highlight the benefits and disadvantages of the real and virtual models. The participants had difficulties in estimating the size of both the desktop room and the room in the ITP system. The comparison of real and virtual rooms revealed unsatisfying differences in shadowing and colour appearance. We defined the magnitude of perceived colour reflections in the real room, and elaborated with some of the parameters in Lightscape/3dsmax6.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a methodology for comparing and refining perceptual image quality metrics based on synthetic images that are optimized to best differentiate two candidate quality metrics. We start from an initial distorted image and iteratively search for the best/worst images in terms of one metric while constraining the value of the other to remain fixed. We then repeat this, reversing the roles of the two metrics. Subjective test on the quality of pairs of these images generated at different initial distortion levels provides a strong indication of the relative strength and weaknesses of the metrics being compared. This methodology also provides an efficient way to further refine the definition of an image quality metric.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we used two types of impairments in a psychophysical experiment to measure the overall annoyance and individual strength of three impairment features (fuzzy, blocky, and blurry). The impairments were generated by compressing the original videos with MPEG-2 at two different bitrates: 1.0 and 7.5 Mbps. The heavily compressed videos presented blurry and blocky impairments, while the lightly compressed videos presented 'fuzzy' impairments, using a word provided by our test subjects. These impairments were then linearly combined in different proportions and strengths, generating videos in which all three impairment features are present. Our goal was to determine how these impairment features combine to produce the overall annoyance. A modified Minkowski metric was used to describe the 'combination rule' which relates the strengths of the impairment features to the overall annoyance. For the data set containing all test sequences, the optimal value found for the Minkowski parameter p was 1.55. From the data obtained, we also estimated the psychometric and annoyance functions. We found that for blocky-blurry and fuzzy artifacts there is no consistent difference between either the thresholds or mid-annoyance strengths.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The degree of visibility of any kind of stimulus is largely determined by the background on which it is shown, a property commonly known as masking. Many psychophysical experiments have been carried out to date to understand masking of sinusoids or Gabor targets by similar maskers and by noise, and a variety of masking
models have been proposed. However, these stimuli are artificial and quite simplistic compared to natural scene content. Masking models based on such experiments may not be accurate for more complex cases of masking. We investigate the visibility of noise itself as a target and use natural images as the masker. Our targets are Gaussian white noise and band-pass filtered noise of varying energy. We conducted psychophysical experiments to determine the detection threshold of these noise targets on many different types of image content and present the results here. Potential applications include image watermarking or quality assessment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Contone imagery usually has eight bits per pixel for each of the three primaries in typical displays. However, there are
often points in the imaging pipeline that constrain this number for cost reasons. Conversely, higher quality displays seek
to achieve 9-10 bits/pixel/color, though there may be system bottlenecks limited at 8. In both cases, a goal is to achieve a
higher perceived bit-depth quality than is afforded by the imaging system. The two main artifacts caused by reduced bitdepth
are contouring and loss of low amplitude detail. Prevention of these distortions can be accomplished by applying
a dithering process before the bit-depth limitation. A technique for achieving bit-depth extension via spatiotemporal
dithering has been previously been presented [1]. In applications where it is only possible to affect the image after the
bit-depth losses have already occurred, it is impossible to accurately restore the loss of low-amplitude detail. However, it
is possible to remove the false contours. Of the several approaches used to remove false contours, we will discuss
predictive cancellation and its dependence on the spatial frequency localization and masking properties of the visual
system. We discuss the key visual properties that arose while investigating these two applications, which include the
optical transfer function (OTF) of the eye, masking by noise, and contour integration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A visual difference metric was implemented on a commodity graphics card to take advantage of the increased processing power available today in a Graphics Processing Unit (GPU). The specific algorithm employed was the Sarnoff Visual Discrimination Metric (Sarnoff VDM). To begin the implementation, the typical architecture of a contemporary GPU was analyzed and some general strategies were developed for performing image processing tasks on GPUs. The stages of the Sarnoff VDM were then mapped onto the hardware and the implementation was completed. A performance analysis showed that the algorithm's speed had been increased by an order of magnitude over the original version that only ran on a CPU. The same analysis showed that the energy stage was the most expensive in terms of both program size and processing time. An interactive version of the Sarnoff
VDM was developed and some ideas for additional applications of GPU based visual difference metrics were suggested.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We first review theoretical results for the problem of estimating single and multiple transparent motions. For N motions
we obtain a M×M generalized structure tensor JN with M = 3 for one, M = 6 for two, and M = 10 for three motions.
The analysis of motion patterns is based on the ranks of JN and is thus not only conceptual but provides computable
confidence measures for the different types of motions. To resolve the correspondence between the ranks of the tensors
and the motion patterns, we introduce the projective plane as a new way of describing motion patterns. In the projective
plane, intrinsically 2D spatial patterns (e.g. corners and line ends) that move correspond to points that represent the only
admissible velocity, and 1D spatial patterns (e.g. straight edges) that move correspond to lines that represent, as a set
of points, the set of admissible velocities. We then show a few examples for how the projective plane can be used to
generate novel motion patterns and explain the perception of these patterns. We believe that our results will be useful
for designing new stimuli for visual psychophysics and neuroscience and thereby contribute to the understanding of the
dynamical properties of human vision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Over the past few years there has been an increasing interest in real time video services over packet networks. When
considering quality, it is essential to quantify user perception of the received sequence. Severe motion discontinuities are
one of the most common degradations in video streaming. The end-user perceives a jerky motion when the
discontinuities are uniformly distributed over time and an instantaneous fluidity break is perceived when the motion loss
is isolated or irregularly distributed. Bit rate adaptation techniques, transmission errors in the packet networks or
restitution strategy could be the origin of this perceived jerkiness. In this paper we present a psychovisual experiment
performed to quantify the effect of sporadically dropped pictures on the overall perceived quality. First, the perceptual
detection thresholds of generated temporal discontinuities were measured. Then, the quality function was estimated in
relation to a single frame dropping for different durations. Finally, a set of tests was performed to quantify the effect of
several impairments distributed over time. We have found that the detection thresholds are content, duration and motion
dependent. The assessment results show how quality is impaired by a single burst of dropped frames in a 10 sec
sequence. The effect of several bursts of discarded frames, irregularly distributed over the time is also discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In video sequences, scene cuts produce a temporal masking effect on several kinds of artifacts. This temporal sensitivity
reduction of the human visual system could be present before (backward masking) and after (forward masking) scene
cuts. Related studies reported a significant forward masking in the first 30 to 100 ms following a scene change
depending on the impairment nature and the picture content. Backward masking at scene cuts seems to be less
significant. In this paper we present the results of a psychovisual experiment performed to characterize the temporal
masking effect on discontinuities caused by dropped frames in the vicinity of scene cuts. The forward and backward
masking was estimated in relation to a single burst of discarded frames of different durations. The four alternatives
forced choice psychophysical method was employed to evaluate the detection thresholds. The test was carried out using
natural video contents. Our results from the forward masking test are consistent with those reported in the state of the art
even if the test conditions were quite different. However, the back masking effect on frame dropping perception is more
significant than with forward masking.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Humans have a general understanding about their environment. We possess a sense of distinction between what is consistent and inconsistent about the environment based on our prior experience. Any aspect of the scene that does not fit into this definition of normalcy tends to be classified as an inconsistent event, also referred to as novel event. An example of this is a casual observer standing over a bridge on a freeway, tracking vehicle traffic, where the vehicles traveling at or around the same speed limit are generally ignored and a vehicle traveling at a much higher (or lower)
speed is subject to one's immediate attention. In this paper, we present a computational learning based framework for novelty detection on video sequences. The framework extracts low-level features from scenes, based on the focus of attention theory and combines unsupervised learning with habituation theory for learning these features. The paper presents results from our experiments on natural video streams for identifying novelty in velocity of moving objects and static changes in the scene.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A vernier acuity task was used to compare three electronic displays, a high-resolution CRT, a high-resolution AMLCD, and a very-high-resolution AMLCD. The offset threshold value, approximately 6 seconds arc, was found to be independent of display resolution. The very-high-resolution display had one octave more positional accuracy than the high-resolution displays. This performance difference was readily apparent. To match the image quality of print or photographic reconstruction, a very-high-resolution screen is required.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have measured the ability of observers to estimate the contrast ratio (maximum white luminance / minimum
black or gray) of various displays and to assess luminous discrimination over the tonescale of the display. This was
done using only the computer itself and easily-distributed devices such as neutral density filters. The ultimate goal
of this work is to see how much of the characterization of a display can be performed by the ordinary user in situ, in
a manner that takes advantage of the unique abilities of the human visual system and measures visually important
aspects of the display. We discuss the relationship among contrast ratio, tone scale, display transfer function and
room lighting. These results may contribute to the development of applications that allow optimization of displays
for the situated viewer / display system without instrumentation and without indirect inferences from laboratory to
workplace.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a subjective experiment performed to determine the optimal level of sharpness enhancement for
various image content and two display technologies. For the experiment, a peaking algorithm was used as sharpness
enhancement method and its gain parameter, which controls the amount of sharpness enhancement, was optimized.
Eleven still images were used as image material, and for each original, eight levels of sharpness enhancement were
shown. Subjects were asked to select the image that they prefer most on overall image quality. To study the effect of the
display technology, the experiment was performed on a CRT monitor and LCD panel. The results show an effect of
content: lower gains were preferred for faces and content with flat areas, while higher gains were preferred for highly
textured images. We also found an effect of display: in general for a given image, the averaged gain preferred on the
CRT was equal or higher than on the LCD. The results can be used to optimize sharpness enhancement algorithms to
image content and display type in agreement with the averaged preference of viewers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In digital color imaging, the color information of objects should be reproduced as accurate as possible (unless preferred
color reproduction is demanded), on the other side, the intrinsic imaging noise will propagate to the captured image and
affect the final image quality. Previous studies have not shown how the color accuracy or noise reduction should be
emphasized. Both the noise performance and color accuracy performance should be balanced in order to achieve better
total perceived image quality.
In this paper, a new comprehensive error metric that is a flexible trade-off between color accuracy and RMS noise is
proposed. The linear matrix that converts the device signals to device independent color signals is analytically optimized
by minimizing this comprehensive error metric. By changing the weights to the color and noise components, one can
expect a reproduced image that achieves better color accuracy yet more noise, or an image that has worse color accurate
but less noise, depending on applications and capture conditions. The analytical approach presents a full perspective of
the color and noise characteristics in digital color imaging devices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Reading is a fundamental task and skill in many environments including business, education, and the home. Today, reading often occurs on electronic displays in addition to traditional hard copy media such as books and magazines, presenting issues of legibility and other factors that can affect human performance [1]. In fact, the transition to soft copy media for text images is often met with worker complaints about their vision and comfort while reading [2-6]. Careful comparative evaluations of reading performance across hard and soft copy device types are rare, even though they are clearly important given the rapid and substantial improvements in soft copy devices available in the marketplace over the last 5 years. To begin to fill this evaluation gap, we compared reading performance on three different soft copy devices and traditional paper. This study does not investigate comfort factors such as display location, seating comfort, and more general issues of lighting, rather we focus instead on a narrow examination of reading performance differences across display types when font sizes are large.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper will present a literature survey on the basic aspects of the possibilities for color presentation in the
peripheral visual field and the results from some experiments from two laboratories in Japan and in Sweden. The
method used was a color naming technique that included hue and saturation/chromaticness estimations of color
stimuli of different eccentricity. In one laboratory, the size effect was also examined. Unique hue components of
the stimuli were derived from the results of hue and saturation/chromaticness estimates. The results from the two
laboratories showed similar tendency despite the differences in the experiments. The results showed that an
increase of the retinal temporal eccentricity to 40 deg caused impaired color appearance especially for red and
green colors. Smaller color stimuli, subtending 2 deg of visual angle, were perceived as less chromatic as larger
color stimuli, subtending 6.5 deg of visual angle. The results are in line with some earlier studies showing that
blue and yellow colors are better perceived than green and red in periphery.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We apply a biologically-motivated algorithm that selects visually-salient regions of interest in video streams to multiply-foveated video compression. Regions of high encoding priority are selected based on nonlinear integration of low-level visual cues, mimicking processing in primate occipital and posterior parietal cortex. A dynamic foveation filter then blurs (foveates) every frame, increasingly with distance from high-priority regions. Two variants of the model (one with continuously-variable blur proportional to saliency at every pixel, and the other with blur proportional to distance from three independent foveation centers) are validated against eye fixations from 4-6 human observers on 50 video clips (synthetic stimuli, video games, outdoors day and night home video, television newscast, sports, talk-shows, etc). Significant overlap is found between human and algorithmic foveations on every clip with one variant, and on 48 out of 50 clips with the other. Substantial compressed file size reductions by a factor 0.5 on average are obtained for foveated compared to unfoveated clips. These results suggest a general-purpose usefulness of the algorithm in improving
compression ratios of unconstrained video.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The saliency-based or bottom-up model of visual attention presented in this paper deals with still color images. The
model we built is based on numerous properties of the human visual system (HVS), thus providing a biologically
plausible system. The computation of early visual features such as color and orientation is a key step for any bottom-up
model and the way to extract these visual features easily permits to differentiate a model from an other. The novelty of
the proposed approach lies on the fact that the computation of early visual features is fully based on a HVS model
consisting in projecting the picture into an opponent-colors space, applying a perceptual decomposition, contrast
sensitivity and masking functions. Moreover, a strategy essentially based on a center surround mechanism and on the
perceptual grouping phenomena underscores conspicuous locations by combining visual feature maps. A saliency map
which is defined as a 2D topographic representation of conspicuity is then deduced. The model is applied to a number of
natural images. Our results are then compared with the results of a well-know bottom-up model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Seemingly complex tasks like visual search can be analyzed using a cognition-free, bottom-up framework. We sought to reveal strategies used by observers in visual search tasks using accurate eye tracking and image analysis at point of gaze. Observers were instructed to search for simple geometric targets embedded in 1/f noise. By
analyzing the stimulus at the point of gaze using the classification image (CI) paradigm, we discovered CI templates that indeed resembled the target. No such structure emerged for a random-searcher. We demonstrate, qualitatively and quantitatively, that these CI templates are useful in predicting stimulus regions that draw
human fixations in search tasks. Filtering a 1/f noise stimulus with a CI results in a 'fixation prediction map'. A qualitative evaluation of the prediction was obtained by overlaying k-means clusters of observers' fixations on the prediction map. The fixations clustered around the local maxima in the prediction map. To obtain a quantitative comparison, we computed the Kullback-Leibler distance between the recorded fixations and the prediction. Using random-searcher CIs in Monte Carlo simulations, a distribution of this distance was obtained. The z-scores for the human CIs and the original target were -9.70 and -9.37 respectively indicating that even
in noisy stimuli, observers deploy their fixations efficiently to likely targets rather than casting them randomly hoping to fortuitously find the target.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image Features, Perception, Analysis, and Indexing
When a textured 3-dimensional surface is projected in perspective, the statistics of the texture in the image change with the shape of the surface. Most shape-from-texture models assume that these changes are due solely to the projection of non-fronto-parallel portions of the surface. This is true for developable surfaces, which are formed by bending or curving flat, textured sheets without tearing or stretching. However, for other surfaces such as those carved from solids or formed by stretched materials, the texture on the surface is generally not homogenous. If the perspective image is parsed into
local Fourier spectra, we find that signature patterns of orientation flows occur at locations corresponding to specific 3-D shapes. These patterns occur generically for developable, carved and stretched surfaces and when they are visible, observers make veridical shape judgments. In contrast, frequency modulations vary systematically for different types of surfaces, and often lead to non-veridical percepts when they are caused by changes in slant (e.g. isotropically textured
developable surfaces). Our results suggest that in the extraction of 3-D shape, the visual system can generically employ a limited number of neural mechanisms to extract the signature orientation flows from the image regardless of homogeneity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have been studying about people's visual impression and image features for texture images in order to clarify the human subjective interpretation mechanism for images[1]. In corresponding image features of human impressions for the images, we found that the impressions for material were bottle-necked. We have studied a new analysis method which gives the impression for material from texture images. Especially, we mainly focused on the properties of visual targets which people can feel tactile sense. In this paper, we propose a new texture analysis method which is based on frequency
analysis with 3D texture which is designed for photorealistic rendering. We found that our new method can estimate not only the surface roughness but also the surface softness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have developed a method for clustering features into objects by taking those features which include intensity,
orientations and colors from the most salient points in an image as determined by our biologically motivated
saliency program. We can train a program to cluster these features by only supplying as training input the number of
objects that should appear in an image. We do this by clustering from a technique that involves linking nodes in a
minimum spanning tree by not only distance, but by a density metric as well. We can then form classes over objects
or object segmentation in a novel validation set by training over a set of seven soft and hard parameters. We discus
as well the uses of such a flexible method in landmark based navigation since a robot using such a method may have
a better ability to generalize over the features and objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Three sets of color tone stimuli were created for three hues, red, green and blue, by varying just two parameters, saturation and value. Two methods were employed to study how native speakers of Japanese use adjectives to describe differences in their perceptions of color tones. A preliminary elicitation employed the methods of selection description, in which Japanese adjectives meaning pale, bright, vivid, strong, dull and dark constituted a high proportion of responses for 56 Japanese native speakers. These adjectives were employed in a triadic comparison method for the same stimuli, and the adjectives were used in a consistent manner for all three hues. Of particular interest were two pairs of adjective contrasts, first, vivid vs. dull, described variation along the axis connecting the tone at both highest saturation and highest value with the tone at both lower saturation and lower value. The second adjective contrast, bright vs. strong, was practically orthogonal to the first. To further document the consensual use of these pairs of adjectives in describing variation of color tone, two additional experiments were executed to determine the boundary color tones at which adjective labels switch from bright to strong and from vivid to dull.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present the concept of intelligent Content-Based Image Retrieval (iCBIR), which incorporates knowledge concerning human cognition in system development. The present research focuses on the utilization of color categories (or focal colors) for CBIR purposes, in particularly considered to be useful for query-by-heart purposes.
However, this research explores its potential use for query-by-example purposes. Their use was validated for the field of CBIR by two experiments (26 subjects; stimuli: 4 times the 216 W3C web-safe colors) and one question ("mention ten colors"). Based on the experimental results a Color LookUp Table (CLUT) was defined. This
CLUT was used to segment the HSI color space into the 11 color categories. With that a new color quantization method was introduced making a 11 bin color histogram configuration possible. This was compared with three other histogram configurations of 64, 166, and 4096 bins. Combined with the intersection and the quadratic
distance measure we defined seven color matching systems. An experimentally founded benchmark for CBIR systems was implemented (1680 queries were performed measuring relevance and satisfaction). The 11 bin histogram configuration did have an average performance. A promising result since it was a naive implementation
and is still a topic of development.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current image indexing methods are based on measures of visual content. However, this approach provides only a partial solution to the image retrieval problem. For example, an artist might want to retrieve an image (for use in an advertising campaign) that evokes a particular "feeling" in the viewer. One technique for measuring evoked feelings, which originated in Japan, indexes images based on the inner impression (i.e. the kansei) experienced by a person while
viewing an image or object-impressions such as busy, elegant, romantic, or lavish. The aspects of the image that evoke this inner impression in the viewer are called kansei factors. The challenge in kansei research is to enumerate those factors, with the ultimate goal of indexing images with the "inner impression" that viewers experience. Thus, the focus is on the viewer, rather than on the image, and similarity measures derived from kansei indexing represent
similarities in inner experience, rather than visual similarity. This paper presents the results of research that indexes images based on a set of kansei impressions, and then looks for correlations between that indexing and traditional content-based indexing. The goal is to allow the indexing of images based on the inner impressions they evoke, using visual content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Zoom magnification is an essential element of video-based low vision enhancement systems. However, since optical
zoom systems are bulky and power intensive, digital zoom is an attractive alternative. This paper determines the visual
acuity of 15 subjects when a letter chart is viewed through a video system with various levels of digital zoom. A strategy
in which the 1:1 magnified image is obtained by combining optical magnification with digital minification gives the best
result, provided background scene information is know from the other cameras. A real-time FPGA based system for
simultaneous zoom and smoothing is also demonstrated for text reading and enhancement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the key variables that have been used for identifying objects in two-dimensional imagery is shape. Humans have the ability to discriminate between shapes and can perceive an imperfect shape as belonging to a particular object class. Each object class has a boundary where a human perceives an object as belonging to one class or the other. Perceptual classification boundaries define the human perception that classifies a shape as belonging to a particular object class. In this paper, the perceptual difference between several
primitive two-dimensional object shapes is examined. Unlike the human, computer recognition algorithms are typically designed to recognize a finite number of classes of objects. This paper focuses on two-class and three-class recognition problems using simple primitive shapes consisting of a single-filled, closed loop
contour. To determine the perceptual classification boundary, one primitive shape is morphed into another, and a group of persons are used to quantify where the perceived boundary is located between objects. Various shape measures are then applied to the primitive shapes to determine how well some current measures can
quantify the perceived classification boundary. The addition of gaussian noise to the primitive two-dimensional shapes is also examined along with quantitative and perceived human results. The results suggest that the tested quantitative measures do not provide results similar to human perception. Some measures are better than others at achieving perceptual classification. The paper demonstrates that an approximate perceptual classification measure can be achieved by using human observer perceptual thresholds along with a quantitative measure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Intelligent virtual human is widely required in computer games, ergonomics software, virtual environment and so on. We present a vision-based behavior modeling method to realize smart navigation in a dynamic environment. This behavior model can be divided into three modules: vision, global planning and local planning. Vision is the only channel for smart virtual actor to get information from the outside world. Then, the global and local planning module use A* and
D* algorithm to find a way for virtual human in a dynamic environment. Finally, the experiments on our test platform (Smart Human System) verify the feasibility of this behavior model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, new harmonic analysis tools providing sparse representation in high dimension space have been
proposed. In particular, ridgelets and curvelets bases are similar to the sparse components of naturally occurring image
data derived empirically by computational neuroscience researchers. Ridgelets take the form of basis elements which
exhibit very high directional sensitivity and are highly anisotropic. The ridgelet transform have been shown to provide a
sparse representation for smooth objects with straight edges. Independently, for the purpose of scene description, the
shape of the Fourier energy spectra has been used as an efficient way to provide a “holistic” description of the scene
picture and its semantic category. Similarly, we focus on a simple binary semantic classification (artificial vs. natural)
based on various ridgelet features. The learning stage is performed on a large image database using different state of the
art Linear Discriminant techniques. Classification results are compared with those resulting from the Gabor
representation. Additionally, ridgelet representation provides us with a way to accurately reconstruct the original signal.
Using this synthesis step, we filter the ridgelet coefficients with the discriminant vector. The resulting image identifies
the elements within the scene contributing to the different perceptual dimensions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Early methods of image indexing relied heavily on color histograms, which characterize the global content of images.
However, global indexing methods proved to be unsatisfactory, and researchers now employ more localized measures of
image content, based on relatively small regions. At the same time, it has also become clear that image indexing should
be based on higher-level visual content. This raises an important question: “Can the higher-level content of images be
reliably indexed using local analysis?” In general, humans are better at indexing mid-level and high-level visual content
than today’s automated indexing algorithms. Therefore, it makes sense to ascertain how well humans can perform midlevel
or high-level indexing, based on small regions. This paper describes research that employs a set of outdoor scenery
images (called the NaturePix image set) to compare how successfully humans can label the visual content of small
regions of natural images when (1) these regions are seen in the context of the larger image, and (2) when these regions
are extracted from (and are seen in isolation from) that larger image. The results of these experiments indicate what
types of higher-level image content can be recognized locally, and how successfully high-level image content can be
indexed on the basis of local feature analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A novel approach for detection of multiple pedestrians from binocular video sequences is reported. First foreground is segmented using an efficient algorithm [3]. Then Ω shape is used to detect head-shoulder contour. We propose a Simplified Fourier Descriptor to represent transformed 1-D shape in phasor notation. A probabilistic model is introduced to evaluate the distribution of SFD in frequency domain. Clues from motion flow and object depth is used to filter out false detection. The proposed system has been tested on several indoor and outdoor sequences. Preliminary experiments have shown that it can robustly detect multiple pedestrians. Experimental results show that the approach is a suitable for real time pedestrian detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a new method that can recognize a sequence of hand motion expressing a sentence in sign language.
Recognition procedure is divided into two steps: separation of the sequence of hand motions into the sub-sequences each
of which expresses one word and combination of the words in order to construct a sentence having a meaning. In the first
step, sequences of hand motion images are segmented by testing the continuity of the hand motions and by the multiscale
image segmentation scheme. The trajectory of the hand motions are estimated by the affine transformation. Each
sign in the sentence is represented by the extended chereme analysis model and each chereme is represented by the status
vector for determining the transition in the HMM. In the second step, each sentence is also represented by the HMM. The
Viterbi algorithm and context-dependent HMM are used to find the best state sequence in the HMM. The proposed
algorithm has been tested with ten sequences of images, each of which expresses a sentence in Korean sign language.
The experimental results have shown that the proposed algorithm can separate the sentence level image sequence into the
word level sub-sequences with the success rate of 75% on average and recognize the sentence with the success rate of
80%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Special Session: Spatial/Color Vision in Honor of Russell Devalois
The neurophysiology and psychophysics of vision provide the basis for vision channels. Vision channels are the foundation of understanding spatial vision. This understanding has led to the development of a general model of visual perception and tests of functional vision. The channel model is shown to predict the Gestalt of many objects and perceptual distortions in a wide variety of spatial patterns misnamed as "visual illusions". Contrast sensitivity has been shown, more than visual acuity, to relate to functional vision and the visual quality of everyday objects viewed at work and play. The channel model and differences in contrast sensitivity help explain why people such as drivers with similar good visual acuity can complain of the quality of vision in one eye and not the other from eye disease and can dete4ct and identify objects at significatnly different distances. The peak of the contrast sensitivity function, about 3 to 6 cpd, is most sensitive to detecting objects at low contrast and is shown to relate to the visibility of a variety of objects in a night driving simulator. Using the contrast sensitivity function from sine-wave grating contrast sensitivity charts, EyeView software creates images that relate to the quality of vision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new evaluation of the local structure of sustained spatial channels with local stimuli in peripheral retina employs the
masking sensitivity approach to minimize analytic assumptions. The stimuli were designed to address predominantly the
sustained response system at 5 deg eccentricity. Under these conditions, the lowest spatial-frequency channel peaked at
about 2 cycle/deg, 4 times higher than previous estimates, with a bandwidth of 1.5-2 octaves. The highest spatialfrequency
channel peaked at 5-6 cycle/deg with about a 1 octave bandwidth. The data are consistent with there being
only one channel tuned between these extremes, although they do not exclude a more continuous channel structure. Our
analysis shows that there are no sustained channels tuned below 2 cycle/deg but there may be channels above the
highest-frequency channel measured if tested with more selective stimuli than employed in our study. For local
sustained stimuli, human peripheral spatial processing therefore appears to be based a simpler channel structure than is
often supposed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study examined whether perceptual learning at early levels of visual processing would facilitate learning at
higher levels of processing. This was examined by determining whether training the motion pathways by practicing leftright
movement discrimination, as found previously, would improve the reading skills of inefficient readers significantly
more than another computer game, a word discrimination game, or the reading program offered by the school. This
controlled validation study found that practicing left-right movement discrimination 5-10 minutes twice a week (rapidly)
for 15 weeks doubled reading fluency, and significantly improved all reading skills by more than one grade level,
whereas inefficient readers in the control groups barely improved on these reading skills. In contrast to previous studies
of perceptual learning, these experiments show that perceptual learning of direction discrimination significantly
improved reading skills determined at higher levels of cognitive processing, thereby being generalized to a new task. The
deficits in reading performance and attentional focus experienced by the person who struggles when reading are
suggested to result from an information overload, resulting from timing deficits in the direction-selectivity network
proposed by Russell De Valois et al. (2000), that following practice on direction discrimination goes away. This study
found that practicing direction discrimination rapidly transitions the inefficient 7-year-old reader to an efficient reader.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A critical component of any video transmission system is an objective metric for evaluating the quality of the video
signal as it is seen by the end-user. In packet-based communication systems, such as a wireless channel or the Internet,
the quality of the received signal is affected by both signal compression and packet losses. Due to the probabilistic
nature of the channel, the distortion in the reconstructed signal is a random variable. In addition, the quality of the
reconstructed signal depends on the error concealment strategy. A common approach is to use the expected mean
squared error of the end-to-end distortion as the performance metric. It can be shown that this approach leads to
unpredictable perceptual artifacts. A better approach is to account for both the mean and the variance of the end-to-end
distortion. We explore the perceptual benefits of this approach. By accounting for the variance of the distortion, the
difference between the transmitted and the reconstructed signal can be decreased without a significant increase in the
expected value of the distortion. Our experimental results indicate that for low to moderate probability of loss, the
proposed approach offers significant advantages over strictly minimizing the expected distortion. We demonstrate that
controlling the variance of the distortion limits perceptually annoying artifacts such as persistent errors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Binocular disparity is one of the most powerful sources of depth information. Stereomotion is motion-in-depth generated by disparity changes. This study is focused on the hMT+/V5 complex, which is known to support both motion and disparity processing in primates. Does the motion-complex process stereomotion as well? BOLD functional magnetic resonance imaging (fMRI) was used. The fMRI contrasts of stereomotion vs stationary stimuli, as well as lateral non-stereoscopic motion vs stationary stimuli, showed strong fMRI activation of the motion complex. Direct contrasts of stereomotion vs different types of lateral-motion also revealed differential activity but in a restricted subregion of the motion complex, suggesting a distinct
stereomotion-selective neuronal subpopulation within it. No consistent activation was found for the stimuli viewed
non-stereoscopically. The stereomotion-specific locus revealed within the hMT+/V5 complex contributes to the understanding of stereomotion perception and of interactions between motion and stereo
mechanisms, as well as to the understanding of the organization of overlapping functionally distinct neuronal subpopulations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.