Person recognition over time is a bit challenging task as compared to re-identification in multi-camera environment. Usually, people appear after certain time period at public places like airports, carrying accessories and changing of clothes etc. In this paper, we proposed a newer recognition framework using two types of images i.e. whole and upper body silhouette. A customized version of DeepLabv3 is used for human body semantic segmentation. The Generic Fourier Descriptor (GFD) based feature set is fed to One-Vs-Rest schema in ensemble of K-Nearest Neighbor (KNN) and Random Forest (RF) classifiers. The experiments are carried out on Front-View Gait (FVG) dataset recorded in year 2017 and 2018 respectively. An overall recognition accuracy of more than 93% is noted using both classifiers on whole body human silhouette images. On the other hand, upper half human silhouette obtained recognition accuracy of more than 91% and 88% using RF and KNN respectively. Code is available at https://git.io/JtfMY
Multiview 3D reconstruction techniques enable digital reconstruction of 3D objects from the real world by fusing different viewpoints of the same object into a single 3D representation. This process is by no means trivial and the acquisition of high quality point cloud representations of dynamic 3D objects is still an open problem. In this paper, an approach for high fidelity 3D point cloud generation using low cost 3D sensing hardware is presented. The proposed approach runs in an efficient low-cost hardware setting based on several Kinect v2 scanners connected to a single PC. It performs autocalibration and runs in real-time exploiting an efficient composition of several filtering methods including Radius Outlier Removal (ROR), Weighted Median filter (WM) and Weighted Inter-Frame Average filtering (WIFA). The performance of the proposed method has been demonstrated through efficient acquisition of dense 3D point clouds of moving objects.
In this paper we propose a background estimation and update algorithm for cluttered video surveillance sequences
in indoor scenarios. Taking inspiration from the sophisticated framework of the Beamlets, the implementation
we propose here relies on the integration of the Radon transform in the processing chain, applied on a blockby-
block basis. During the acquisition of the real-time video, the Radon transform is applied at each frame in
order to extract the meaningful information in terms of edges and texture present in the block under analysis,
providing with the goal of extracting a signature for each portion of the image plane. The acquired model is
updated at each frame, thus achieving a reliable representation of the most relevant details that persist over time
for each processed block. The algorithm is validated in typical surveillance contexts and presented in this paper
using two video sequences. The first example is an indoor scene with a considerably static background, while
the second video belongs to a more complex scenario which is part of the PETS benchmark sequences.
An approach for image segmentation is presented. Images are first preprocessed using multiscale simplification by nonlinear diffusion. Subsequently image segmentation of the resulting smoothed images is carried out. The actual segmentation step is based on the estimation of the Eigenvectors and Eigenvalues of a matrix derived from both the total dissimilarity and the total similarity between different groups of pixels in the image. This algorithm belong to the class of spectral methods, specifically, the Nystron extension introduced by Fowlkes et al in [1]. Stability analysis of the approximation of the underlying spectral partitioning is presented. Modifications of Fowlkes technique are proposed to improve the stability of the algorithm. The proposed modifications include a criterion for the selection of the initial sample and numerically stable estimations of ill-posed inverse matrices for the solution of the underlying mathematical problem. Results of selected computer experiments are reported to validate the superiority of the proposed approach when compared with the technique proposed in [1].
This paper describes algorithms that were developed for a stereoscopic videoconferencing system with viewpoint adaptation. The system identifies foreground and background regions, and applies disparity estimation to the foreground object, namely the person sitting in front of a stereoscopic camera system with rather large baseline. A hierarchical block matching algorithm is employed for this purpose, which takes into account the position of high-variance feature points and the object/background border positions. Using the disparity estimator's output, it is possible to generate arbitrary intermediate views from the left- and right-view images. We have developed an object-based interpolation algorithm, which produces high-quality results. It takes into account the fact that a person's face has a more or less convex surface. Interpolation weights are derived both from the position of the intermediate view, and from the position of a specific point within the face. The algorithms have been designed for a realtime videoconferencing system with telepresence illusion. Therefore, an important aspect during development was the constraint of hardware feasibility, while sufficient quality of the intermediate view images had still to be retained.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.