|
1.IntroductionWith advancements in mobile communication devices, technology now allows people to communicate while looking at each other’s face. This technology is also referred to as videoconferencing and basically transmits images to a display system so users can see each other while talking, as shown in Fig. 1(a). Many market analysts predict the number of subscribers to image communication services grows exponentially every year because of lower mobile device prices and aggressive marketing of communication companies, as shown in Fig. 1(b). As image communication services come into wide use, consumers want high-quality services. Although image communication services already exist over third-generation (3G) wireless networks, such as the high-speed downlink packet access (HSDPA), there are still obstacles that prevent high-quality communications because of limited bandwidth (maximum uploading and downloading speeds are 14.4 and 5.76 Mbps, respectively). Consequently, more research is required to overcome the limited bandwidth of current communications systems and achieve high-quality image reconstruction in mobile devices. In terms of image processing, core technologies for high-quality image reconstruction are face hallucination and compression artifact reduction. Face hallucination technology, which is also referred as face super-resolution (SR), is very important for image communications because the main interests of consumers are facial regions, as shown in Fig. 2. A number of related face hallucination methods have been proposed in recent years. Among them, learning-based methods have received much attention because they can achieve a high magnification factor and produce good SR results compared with other methods. Baker and Kanade1,2 first introduced a face hallucination method which constructs the high frequency components from a parent-structure resorting to the training set. Wang and Tang3 presented a principal component analysis (PCA)-based face hallucination algorithm to globally infer the high-resolution face image. Liu et al.4 developed a two-step statistical modeling approach which integrates a global model and a local model corresponding to the common and specific face characteristics, respectively. Although complicated probabilistic models are required in Liu et al.’s method,4 the idea of the two-step approach became more and more popular since then. Recently, a novel face hallucination method based on position-patch has been proposed. The position-patch based method hallucinates the high resolution (HR) image patch using the same position image patches of training images.5–7 Thus, it is able to save computational time and produce high-quality SR results compared to manifold learning-based methods. With respect to the compression artifact reduction, several compression artifacts inevitably occur because of the loss of high frequency components caused by lossy compression techniques such as H.264 or MPEG-4 (representative artifact: blocking artifact). They seriously degrade the picture quality and are annoying to viewers of the reconstructed images as shown in Fig. 3.8,9 Accordingly, compression artifact reduction is also very important for image communications. Blocking artifacts appear as grid noise along the block boundaries because each block is transformed and quantized independently. Blocking artifacts occur because of the independent transform and quantization of each block without considering inter-block correlations. Up to now, many studies have been conducted to reduce blocking artifacts from compressed images. Among them, image restoration techniques are commonly used to reduce blocking artifacts and recover the original image;10 projection-onto-convex-sets (POCS)-based methods are representative research results of such techniques. In the POCS-based methods, prior information was represented as convex sets for reconstruction, and blocking artifacts were reduced by iteration procedures.11 POCS-based methods are very effective in reducing blocking artifacts because they are easy to impose smoothness constraints around block boundaries. Total variation (TV)-based methods are actively studied for image deblocking.12,13 TV provides an effective criterion for image restoration, and thus can be successfully used as prior information for image deblocking. Alter et al.13 proposed a constrained TV minimization method to reduce blocking artifacts without removing perceptual features. By the TV minimization, edge information was effectively preserved while reducing blocking artifacts. Moreover, a field of experts (FoE) prior was successfully applied to image deblocking.10 In this method, the image deblocking problem was solved by the maximum a posteriori (MAP) estimation based on the FOE prior. The two technologies are associated with inverse problems in image processing. In this article, we provide an outline of recent studies on face hallucination and compression artifact reduction. The rest of this article is organized as follows. In Sec. 2, we describe the inverse problems in image processing. In Sec. 3, we explain recent research trends and results related to face hallucination, and we address them related to compression artifact reduction in Sec. 4. In Sec. 5, we discuss practical considerations and possible solutions to implement two technologies in mobile applications. Finally, conclusions are made in Sec. 6. 2.Inverse Problems in Image ProcessingInverse problems involve estimating parameters or data from inadequate observations; the observations are often noisy and contain incomplete information about the target parameter or data due to physical limitations of the measurement devices. Due to lack of sufficient information in the indirect observations, solutions to inverse problems are usually nonunique and challenging. That is, they are ill-posed problems, and thus, some other reconstruction technologies are required to solve them including machine learning, Bayesian inference, convex optimization, sparse representation, and so on.14–16 Indeed, many problems in image processing can be represented as inverse problems. They are modeled by relating the observed image to the unknown original image . A general form for the relation is as follows:14 where represents the pixel position, represents the whole surface of , is an operator representing the forward problem, and represents the errors (modeling uncertainty and observation errors). If we assume operator is linear, we can write the observation model in a vector-matrix form as follows: where , and are vectors containing the observed image pixel values, unknown original image pixel values, and observation errors, respectively; and is a huge dimensional matrix whose elements are defined from .Figure 4 shows the observation model in image processing which can be formulated as inverse problems. In image processing, there are many inverse problems such as image denoising, image SR, image deblurring, image decompression, and so on. Above all, we inevitably meet several inverse problems in image communications because transmission bandwidth is strictly limited in a mobile communication environment. Consequently, image sequences are compressed and transmitted using lossy compression techniques such as H.264 and MPEG-4, and thus, undesired image distortions also occur because of compression artifacts resulting from lossy compression techniques. In this article, we deal with two representative inverse problems in image processing: face hallucination and image deblocking. 3.Face HallucinationSince the concept of face hallucination is introduced by Baker and Kanade,1,2 a number of related face hallucination methods have been proposed during the past decade. In general, there are two classes of SR techniques: multiframe SR (from inputs images only) and single-frame SR (from other training images). From a methodological viewpoint, it can be widely divided into interpolation-based,17,18 reconstruction-based,19–24 and learning-based3,6,7,25–30 methods. First, the basic interpolation methods include nearest-neighbor interpolation, bilinear interpolation and bicubic interpolation, etc.17,18 Given one low resolution (LR) image, they only use the information of the original pixel and several pixels around it to estimate the missing pixels. It is simple and fast and can get some results when the interpolation factor is small. However, when the interpolation factor is large, the performance is not good because the high frequency information is missed. Second, reconstruction-based methods firstly build an observation model to connect the original HR image and realistic LR image, simulating the process to get a LR image from a HR image. There are many reconstruction-based methods, such as POCS,19 MAP method,20 iterative back-projection method,21,22 regular method,23 and mixed method,24 etc. All of them need some locality prior assumptions, and can make the blur and saw-tooth effects to a certain extent. Since the prior knowledge is somewhat little, the information provided by LR images may not satisfy with the demand for HR images. Third, learning-based methods have received much attention in recent years because they can achieve a high magnification factor and produce good SR results compared with other methods. The basic idea is to compute the neighborhood between the patch of test images and the patches of training images set, and construct the optimal coefficients to approximate the HR image using the learned prior knowledge. In this article, we focus on learning-based face hallucination methods and introduce some representative works and our research results. 3.1.Example-Based Image SRIn 2001, example-based image SR was proposed by Freeman et al. Its core idea was to learn the fine details from HR images of training datasets, and use the learned relationships between LR and HR to predict fine details of a test image. Above all, Freeman et al. employed a nonparametric patch-based prior along with the Markov random field (MRF) model to generate the desired HR images. A large dataset of HR and LR patch pairs was generated and used for seeking the nearest neighbors of the LR input patches. The selected HR patch neighbors were treated as the candidates for the target HR patch. The block diagram of the method is shown in Fig. 5. As shown in the figure, the key procedure of this method is to predict the missing high frequencies using the training datasets. 3.2.Neighbor-Embedding Based Image SRIn 2004, Chang et al. proposed a novel method for solving single-image SR problems. In this method, given an LR image as input, a set of training examples were used to recover its HR counterpart. While this formulation resembled other learning-based methods for SR, this method was inspired by manifold learning-based methods, particularly locally linear embedding (LLE). More specifically, small image patches in LR and HR images formed manifolds with similar local geometry in two distinct feature spaces. Then, multiple nearest neighbors were selected in the feature space, and SR images were reconstructed by the corresponding HR patches of the nearest neighbors. Since then, this method has been extensively applied to solving image SR problems including face hallucination. 3.3.PCA-Based Face HallucinationIn 2005, a new face hallucination method using eigen-transformation was proposed by Wang et al. In contrast to conventional methods based on probabilistic models, this method viewed face hallucination as a transformation between different image styles. PCA was used to fit the input face image as a linear combination of the LR face images in the training dataset. The HR image was rendered by replacing the LR training images with HR ones, while retaining the same combination coefficients. Since face images were well structured and had similar appearances, they spanned a small subset in the high dimensional image space. In the work of Penev and Sirovich,31 face images were shown to be well reconstructed by PCA representation with 300 to 500 dimensions. The system diagram of this method is shown in Fig. 6. As shown in the figure, this method first employed PCA to extract useful information as much as possible from an LR face image, and then rendered an HR face image by eigen-transformation. 3.4.Sparse Coding Based Face HallucinationIn 2008, a new approach to single-image SR based on sparse signal representation was proposed by Yang et al. This method was motivated by the image statistics that image patches could be well-represented as a sparse linear combination of elements from an appropriately chosen overcomplete dictionary. They found sparse representation for each patch of the LR input, and then used the coefficients of this representation to generate the HR output. Theoretical results from compressed sensing suggested that under mild conditions, the sparse representation could be correctly recovered from the down-sampled signals. By jointly training two dictionaries for the LR and HR image patches, they made the similarity of sparse representations between the LR and HR pairs with respect to their own dictionaries. Therefore, the sparse representation of an LR patch was applied to the reconstruction of SR images with the HR patch dictionary. The learned dictionary pair was a more compact representation of the patch pair compared to previous approaches, and simply sampled a large amount of image patch pairs reducing the computational cost effectively. 3.5.Position-Patch Based Face HallucinationIn 2010, a novel face hallucination approach was proposed by Ma et al. In contrast to most of the conventional methods based on probabilistic models or manifold learning, the position-patch based method hallucinated the HR image patch using the same position image patches of each training images. The optimal weights of the training image position-patches were estimated and the hallucinated patches were reconstructed using the same weights. The final SR face images were formed by integrating the hallucinated patches. It was able to save computational time and produce high-quality SR results compared to conventional manifold learning based methods. The position-patch based face hallucination method is briefly described in Algorithm 1. Algorithm 1Position-patch based face hallucination.5
3.6.Convex-Optimization-Based Face HallucinationInspired by the position-patch based face hallucination method, a new convex optimization based face hallucination method is proposed. The position-patch based method has employed least square estimation to get the optimal weights for face hallucination; however, the least square estimation approach can provide biased solutions when the number of the training position-patches is much larger than the dimension of the patch. To overcome this problem, we make use of constrained convex optimization instead of least square estimation to obtain the optimal weights for face hallucination. The optimal weights () are computed by solving the following convex optimization problem: where is a column matrix of the training patches for ; and is a error tolerance. Consequently, the hallucinated HR patch is obtained by:By Eqs. (3) and (4), we can get more stable reconstruction weights for face hallucination because -norm is more suitable for this problem, and because each patch can be approximated with a smaller subset of patches than -norm. In contrast, -norm provides nonzero weights for all patches. Figure 7 shows the face hallucination results by bi-cubic interpolation, example-based image SR,25 neighbor-embedding based image SR,26 position-patch based face hallucination,5 and convex optimization based face hallucination.7 We performed experiments on the CMU-PIE face database which contains 41,368 images obtained from 68 subjects. We took the frontal face images with 21 different illumination conditions. Thus, the total number of images was 1,428. Among them, 630 images of 30 subjects were used in the training stage, and the rest were used in the synthesis stage. In the neighbor-embedding method, the HR patch size of was pixels, while the corresponding LR patch size of was pixels. In addition, the number of the neighbor patches for reconstruction was 5. The size of the image patches in position-patch and convex optimization methods was pixels. The size of LR images for training and synthesis was pixels, while that of hallucinated results was pixels. That is, interpolation factor was 4. As shown in the figure, learning based methods generally produce better face hallucination results than traditional bicubic interpolation. Above all, the hallucinated results of Refs. 25 and 26 are somewhat blurred and with some artifacts; however, results of Refs. 5 and 7 produce more natural looking facial images. Further examination of the results reveals that Ref. 7 is more effective in preserving the edge and image details in the nose and mouth areas than Ref. 5. For a more quantitative test, average peak-to-noise ratio (PSNR) and structural similarity (SSIM) values of the face hallucination results are provided in Table 1. The SSIM is a complementary measure of the PSNR, which gives an indication of image quality based on known characteristics of the human visual system.32 Here, the unit of PSNR is dB. As shown in the table, our method achieves the best hallucination performances in terms of the PSNR and SSIM. Here, the bold numbers represent the best PSNR and SSIM values. Table 1Average PSNR and SSIM values of different methods. 4.Compression Artifact ReductionBlock-based discrete cosine transform (BDCT) has been widely used in image and video compression due to its energy compacting property and relative ease of implementation.33–36 Thus, BDCT has been adopted in most image/video compression standards including JPEG (joint photographic experts group) and MPEG (motion picture experts group). However, BDCT has a major drawback, which is usually referred as blocking artifacts. Blocking artifacts appear as grid noise along the block boundaries because each block is transformed and quantized independently. Usually, the lower the bit rate is, the more serious the blocking artifacts are. Blocking artifacts occur because of the independent transform and quantization of each block without considering inter-block correlations. 4.1.Main Techniques for Image DeblockingThere are two main techniques to deal with the blocking artifacts: in-loop filtering and postprocessing methods. The in-loop filters operate within coding loop while the postprocessing methods are applied after the decoder and make use of decoded parameters. Table 2 lists the deblocking filters employed by current video coding standards.37 As listed in the table, in-loop filters have been optionally or not used because of the need of changing the encoder structure. Thus, postprocessing methods are promising solutions to this problem and comparable results have been achieved by researchers. Table 2Deblocking filters for video coding standards.37
4.2.Postprocessing Methods For Image DeblockingSince early 1980s, postprocessing of low bit-rate BDCT coded images has a lot of research attention. Postprocessing methods are classified into three main groups: filtering-based, denoising-based, and restoration-based methods.10 First, some researchers viewed the distortions around the block boundaries as spatial, high-frequency components. Thus, many filtering-based methods have been proposed to reduce them. In 1984, Lim and Reeve38 first applied low-pass filtering to the pixels along the boundary to remove the blocking artifacts. Then, in 1986, Ramamurthi and Gersho39 proposed a nonlinear space-variant filter to perform filtering in parallel with the edges. Since then, many filtering-based methods have been presented, and the representative work is the adaptive deblocking filter, which has been used in the H.264/MPEG-4 advanced video coding (AVC) standards to reduce the distortions.40 Second, some researchers viewed deblocking as a denoising problem. They proposed some efficient noise models and some deblocking methods based on the wavelet technique. In 1997, Xiong et al.41 exploited cross-scale correlation by the overcomplete wavelet transform, and used the thresholds to reduce the distortions. In 2004, Liew and Yan34 made a theoretical analysis of the blocking artifacts, and used the three-scale overcomplete wavelet scheme to reduce them. Third, many researchers viewed deblocking as a restoration problem, and proposed restoration-based deblocking methods. The POCS-based method was a representative approach of the restoration-based methods for deblocking.42 In the POCS-based methods, prior information was represented as convex sets for reconstruction, and blocking artifacts were reduced by iteration procedures. The POCS based methods were very effective for reducing blocking artifacts because they were easy to impose smoothness constraint around block boundaries. In 2003, Kim et al.11 proposed a new smoothness constraint set (SCS) and an improved QCS to improve performances of the POCS-based methods. Furthermore, the TV-based methods were actively studied for image deblocking. TV provided an effective criterion for image restoration, and thus could be successfully used as prior information for image deblocking.13,43 In 2004, Alter et al. proposed a constrained TV minimization method to reduce blocking artifacts without removing perceptual features. In 2010, a human visual system (HVS)-based TV method using a new weighted regularization parameter was proposed by Do et al.44 In 2007, a FoE prior45,46 was successfully applied to image deblocking by Sun and Cham.10 In this method, the image deblocking problem was solved by the MAP estimation, based on the FOE prior. In addition, they employed the narrow quantization constraint set (NQCS) for further PSNR gain.47 Consequently, this method achieved a high PSNR gain and produced state-of-the-art results on deblocking. 4.3.Sparse Representation Based Image DeblockingRecently, sparse representation has been actively studied to solve various restoration problems in image processing.48–52 Some researchers have made significant contributions to image denoising, restoration and SR using sparse representation. Sparse representation assumes that original signals can be accurately recovered by several elementary signals called atoms.50,53 Thus, it has been proven very effective for image restoration tasks. Inspired by recent results of sparse representation, we provided a novel deblocking method based on sparse representation.48 To remove blocking artifacts, we obtain a general dictionary from a set of training images using K-singular value decomposition (K-SVD) algorithm, which can effectively describe the content of an image. Then, an error threshold for orthogonal matching pursuit (OMP) is automatically estimated to use the dictionary for image deblocking by the quality of compressed image. Our deblocking method is comprised of two main procedures: generation of a deblocking dictionary using K-SVD algorithm, and image deblocking by the deblocking dictionary. That is, the deblocking dictionary is generated in the training stage, and blocking artifact reduction is performed in the testing stage. 4.3.1.Deblocking dictionary design using K-SVD algorithmIn the training stage, image patches are selected to generate a dictionary for image deblocking. From the image patches, a deblocking dictionary is trained by the K-SVD algorithm. Here, to solve the optimization problem, the batch-OMP method is used.54 The K-SVD algorithm is an iterative method to generate an overcomplete dictionary that fits training examples well. It is simple and designed to be truly direct generalization of the K-Means algorithm.52–56 In general, it alternates between sparse coding and dictionary update while training. Let be an matrix of training patches of -length pixels, used to train an overcomplete dictionary of size with and . For generating , the objective function of the K-SVD algorithm is defined as follows:55,57 where is a given sparsity level, , and is the sparse vector of coefficients representing the ’th patch in terms of the columns of . The K-SVD algorithm progressively creates the deblocking dictionary from an initial dictionary by solving Eq. (5). The full steps of dictionary generation are described in Algorithm 2.Algorithm 2Dictionary generation by the K-SVD algorithm. 4.3.2.Automatic estimation of error thresholdThe deblocking dictionary is employed to reduce blocking artifacts. The objective function for image deblocking is as follows: where is the corrupted image by blocking artifacts and is an error threshold for OMP. Blocking artifacts are reduced by optimizing Eq. (6), and we can reconstruct the original image. As can be expected, an error threshold of Eq. (6) should be estimated to use the deblocking dictionary in reducing blocking artifacts. We can estimate for OMP automatically using quality information of JPEG compressed images. The procedures of estimating are summarized as follows.First, the standard deviation of the quantization noise, , is estimated as shown in Fig. 8. Since the blocking artifacts mostly occur around the block boundaries, is computed from the intensity difference Diff between two boundary pixels on both sides of a boundary between two blocks as follows: where Diff is the absolute value of one-half the intensity difference between two pixels, and . In computing Diff, only horizontal or vertical block discontinuities are considered as mentioned in Ref. 34. In the figure, pixels and belong to and , respectively; is the intensity of a pixel . Accordingly, we compute of the compressed blocky image from Diff.Then, is computed based on . In the previous works for image denoising,34,53,57 is obtained by the following equation: Here, the noise gain is set to 1.15. In the JPEG coding standard, the most important parameter is the quality , which contains a value between 0 and 100. The higher is, the less image degradation due to compression is; however, when is high, the resulting file size is large. For image deblocking, we found that fits well when is only 10 by various experiments. In other cases, do not follow the distribution of the error threshold of Eqs. (6) by (8). Instead, we found that follows nonlinear distribution according to a given quality as shown in Fig. 9. Thus, we modify Eq. (8) as follows: where , , and are the control parameters, and their appropriate values are adjusted by experiments. Here, , , and are set to be 20, 10, and 0, respectively. Consequently, the error threshold for OMP is computed by of Eq. (6), and used to solve Eq. (3). As a result, we get deblocked results of JPEG compressed images by the learned dictionary .As shown in Fig. 10, six typical images were used for the tests, Barbara, Lena, Boat, Peppers, Baboon, and Fruits, whose sizes were pixels. In the training stage, total 91 natural images provided by the Yang et al.’s work51 were used to generate a general dictionary. Dictionary size and all parameters including , , , and of Eqs. (8) and (9) are determined on the training data set. In addition, the dictionary was trained from randomly sampled 100,000-image patches using K-SVD, i.e., the size of each patch is pixels. Thus, the size of the training data was pixels. We performed the experiments until was 20 because the blocking effects mainly occur when was from 0 to 20.36 The dictionary with the 512 atoms is used in our experiments. Figure 11 shows the generated dictionary from the training data. Figures 12 and 13 show the JPEG compressed images and their deblocked results of the Barbara and Baboon images, respectively, according to different quality values, i.e., is 1, 5, 10, 15, or 20. It can be observed that the lower is, the more blocking artifacts occur along block boundaries in the compressed images. This is because transform coefficients of blocks are quantized independently in BDCT based image compression. As can be seen in (a)-(e) of the figures, the blocking artifacts are degrading the quality of picture seriously. In addition, the blocking artifacts are remarkably reduced as the quality increases. In the figures, (f)-(j) show the reduction results of the blocking artifacts by the proposed method. It can be observed that the proposed method suppresses the blocking artifacts efficiently and improves the picture quality, especially along block boundaries where the block discontinuities are severe. To provide more reliable performance evaluation of the results, we compare our method with the latest state-of-the-art one which is based on the FoE prior.10 It has been reported that the method has achieved the best deblocked results in terms of PSNR. As evaluation metrics, the PSNR and SSIM are considered to measure the quality of the estimated images. To simulate various types of BDCT compression, three quantization tables, usually denoted as Q1, Q2, and Q3, have been commonly used by many researchers.10,34 The Q1, Q2, and Q3 tables correspond to a medium to high compression level, similar to what can be obtained by using JPEG with , , and , respectively.9 Accordingly, in our experiments, the values of are used instead of the quantization tables when the performance of our method is evaluated because our method is based on the quality information. Table 3 lists the PSNR and SSIM values of the deblocked results obtained by the FoE prior-based method and ours. In the FOE prior-based method,10 the FoE prior captures the statistics of natural images, and thus, has been effectively employed for image denoising and inpainting.45,46 The FOE prior has been successfully applied to deblocking of BDCT compressed images.10 We have obtained the corresponding software for evaluation at http://www.cs.brown.edu/ dqsun/research/software.html. In the experiments, the FoE filter size is pixels and the maximum number of iterations is 200. In the FoE prior-based method,10 the narrow quantization constraint set (NQCS)47 have been used for the higher PSNR gain of deblocked results, and thus we also report the improved PSNR values by NQCS (see the 7th column). Combined with the NQCS method,47 our method generally achieves the best PSNR and SSIM results about the test images. In the table, the bold numbers represent the best PSNR and SSIM values of each image at each quality. Table 3Performance evaluation results from test images using the proposed and FoE prior-based methods.a
5.Practical Considerations for Mobile ApplicationsCurrently, high-end mobile phones, which are usually referred to as smartphones, support multiple radio standards and a rich suite of applications including advanced radio, audio, video, and graphics processing. They provide more advanced computing ability and connectivity than contemporary feature phones using multiple chips such as a baseband processor and an application processor. Moreover, it is expected that new functionalities are being added to smartphones at an increasing rate; however, the increases in battery capacity have not matched increases in functionality.58–62 In fact, battery capacities have not been growing more than 10%every year, whereas the number of features and applications.59 Thus, the needs for low power and high performance are growing at a significantly higher rate. As listed in Table 4, the present workload of a 3.5 G smartphone amounts to nearly 100 giga operations per second (GOPS). This workload increases at a steady rate, roughly by an order of magnitude every 5 years. The workload is partitioned by application processing, radio processing, media processing, and 3D graphics. Among them, about 60% of the workload is used for radio and application processing. More than 30% of the workload is assigned to media processing including the functions such as display processing, camera processing, video decoding, and encoding. Here, video encoding requires the most amount of operations, i.e., 17 GOPS. In the workload for media processing, 10 GOPS is available, and thus two new functions (e.g., face hallucination and image deblocking) can be realized using it. Recently, the multicore architecture for mobile applications has been proposed to support a workload of 100 GOPS with 1 W.58 We believe the multicore architecture can be effectively employed for implementing the new functions. Table 4Mobile phone trends in 5-year intervals.58
Another way to implement them is to use the graphics processing units (GPU)-based parallelization technology. Fortunately, due to the strong computational locality of video processing algorithms, video processing is highly amenable to parallel processing. Such locality makes it possible to divide video processing tasks into smaller, weakly interacting pieces for parallel computing.63 The GPU-based parallelization technology drastically reduces the amount of operations, and thus, effective parallel architectures and programming also can be used to implement the new functions for mobile applications. 6.ConclusionsIn this article, we provided two core technologies for high-quality image communications from the point of view of image processing: face hallucination and compression artifact reduction. The technologies have a close relation to inverse problems in image processing, and thus, we have described recent studies and our related research results to deal with the inverse problems effectively. When image data are transmitted over mobile communication networks, data loss inevitably occurs in the high frequency components of images because of lossy compression techniques. Thus, the quality of facial regions (i.e., main interests of image communications) is reduced and several compression artifacts inevitably occur. We have demonstrated that convex optimization and sparse representation can be effectively employed for solving the inverse problems and achieving high-quality image communications. In addition, to implement the technologies in actual mobile devices, power management is a critical issue due to the limited capacity of batteries. Therefore, this article also discusses practical considerations and possible solutions to implement two technologies in mobile applications. Nowadays, displays of many different sizes, including mobile displays, have come into wide use. They also have the same problems of high-quality image reconstruction. We believe the two technologies can be effectively employed for enhancing image quality in various displays. AcknowledgmentsThe authors would like to thank all the anonymous reviewers for their valuable comments and useful suggestions on this paper. This work was supported by the National Natural Science Foundation of China (Nos. 61050110144, 60803097, 60972148, 60971128, 60970066, 61072106, 61075041, 61003198, 61001206, and 61077009), the National Research Foundation for the Doctoral Program of Higher Education of China (No. 200807010003 and 20100203120005), the National Science and Technology Ministry of China (Nos. 9140A07011810DZ0107 and 9140A07021010DZ0131), the Key Project of Ministry of Education of China (No. 108115), and the Fundamental Research Funds for the Central Universities (Nos. JY10000902001, K50510020001, and JY10000902045). ReferencesS. BakerT. Kanade,
“Hallucinating faces,”
in Proc. IEEE Int. Conf. Automatic Face and Gesture Recogn.,
83
–88
(2000). Google Scholar
S. BakerT. Kanade,
“Limits on super-resolution and how to break them,”
IEEE Trans. Pattern Anal. Machine Intell., 24
(9), 1167
–1183
(2002). http://dx.doi.org/10.1109/TPAMI.2002.1033210 ITPIDJ 0162-8828 Google Scholar
X. G. WangX. O. Tang,
“Hallucinating face by eigen-transformation,”
IEEE Trans. Sys. Man Cybernetics- C, 35
(3), 425
–434
(2005). http://dx.doi.org/10.1109/TSMCC.2005.848171 1094-6977 Google Scholar
C. LiuH. Y. ShumW. T. Freeman,
“Face hallucination: theory and practice,”
Int. J. Computer Vis., 75
(1), 115
–134
(2007). http://dx.doi.org/10.1007/s11263-006-0029-5 IJCVEQ 0920-5691 Google Scholar
X. MaJ. ZhangC. Qi,
“Hallucinating face by position-patch,”
Pattern. Recogn., 43
(6), 2224
–2236
(2010). http://dx.doi.org/10.1016/j.patcog.2009.12.019 PTNRA8 0031-3203 Google Scholar
X. MaJ. ZhangC. Qi,
“Position-based face hallucination method,”
in Proc. IEEE Conf. Multimedia and Expo,
290
–293
(2009). Google Scholar
C. Junget al.,
“Position-patch based face hallucination using convex optimization,”
IEEE Signal Process. Lett., 18
(6), 367
–370
(2011). http://dx.doi.org/10.1109/LSP.2011.2140370 IESPEJ 1070-9908 Google Scholar
C. JungL. C. Jiao,
“Novel Bayesian deringing method in image interpolation and compression using a SGLI prior,”
Opt. Express, 18
(7), 7138
–7149
(2010). http://dx.doi.org/10.1364/OE.18.007138 OPEXFF 1094-4087 Google Scholar
A. FoiV. KatkovnikK. Egiazarian,
“Pointwise shape-adaptive DCT for high-quality denoising and deblocking of grayscale and color images,”
IEEE Trans. Image Process., 16
(5), 1395
–1411
(2007). http://dx.doi.org/10.1109/TIP.2007.891788 IIPRE4 1057-7149 Google Scholar
D. SunW. K. Cham,
“Postprocessing of low bit-rate block DCT coded images based on a fields of experts prior,”
IEEE Trans. Image Process., 16
(11), 2743
–2751
(2007). http://dx.doi.org/10.1109/TIP.2007.904969 IIPRE4 1057-7149 Google Scholar
Y. KimC. S. ParkS. J. Ko,
“Fast POCS based postprocessing technique for HDTV,”
IEEE Trans. Consumer Electron., 49
(4), 1438
–1447
(2003). http://dx.doi.org/10.1109/TCE.2003.1261252 ITCEDA 0098-3063 Google Scholar
A. GothandaramanR. T. WhitakerJ. Gregor,
“Total variation for the removal of blocking effects in DCT based encoding,”
in Proc. IEEE Conf. Image Process.,
455
–458
(2001). Google Scholar
F. AlterS. Y. DurandJ. Froment,
“Adapted total variation for artifact free decompression of JPEG images,”
J. Math. Imaging Vis., 23
(2), 199
–211
(2005). http://dx.doi.org/10.1007/s10851-005-6467-9 0924-9907 Google Scholar
A. Mohammad-Djafari,
“Bayesian inference for inverse problems in signal and image processing and applications,”
Int. J. Imaging Sys. Appl., 16
(5), 209
–214
(2006). http://dx.doi.org/10.1002/(ISSN)1098-1098 0899-9457 Google Scholar
H. H. Szu,
“Inverse problem of image processing,”
J. Math. Phys., 25
(9), 2767
–2772
(1984). http://dx.doi.org/10.1063/1.526484 JMAPAQ 0022-2488 Google Scholar
G. WangJ. ZhangG. W. Pan,
“Solution of inverse problems in image processing by wavelet expansion,”
IEEE Trans. Image Process., 4
(5), 579
–593
(1995). http://dx.doi.org/10.1109/83.382493 IIPRE4 1057-7149 Google Scholar
D. RajanS. Chaudhuri,
“Generalized interpolation and its application in super-resolution imaging,”
Image Vis. Comput., 19
(13), 957
–969
(2001). http://dx.doi.org/10.1016/S0262-8856(01)00055-5 0262-8856 Google Scholar
S. LertrattanapanichN. K. Bose,
“High resolution image formation from low resolution frames using Delaunay triangulation,”
IEEE Trans. Image Process., 11
(12), 1427
–1441
(2002). http://dx.doi.org/10.1109/TIP.2002.806234 IIPRE4 1057-7149 Google Scholar
H. StarkP. Oskoui,
“High-resolution image recovery from image-plane arrays, using convex projections,”
J. Opt. Soc. Am.-A, 6
(11), 1715
–1726
(1989). http://dx.doi.org/10.1364/JOSAA.6.001715 0740-3232 Google Scholar
R. R. SchulzR. L. Stevenson,
“Extraction of high-resolution frames from video sequences,”
IEEE Trans. Image Process., 5
(6), 996
–1011
(1996). http://dx.doi.org/10.1109/83.503915 IIPRE4 1057-7149 Google Scholar
M. IraniS. Peleg,
“Super resolution from image sequences,”
in Proc. Int. Conf. Pattern Recogn.,
115
–120
(1990). Google Scholar
M. IraniS. Peleg,
“Improving resolution by image registration,”
CVGIP Graphical Models Image Process., 53
(3), 231
–239
(1991). http://dx.doi.org/10.1016/1049-9652(91)90045-L 1049-9652 Google Scholar
N. NguyenP. MilanfarG. Golub,
“Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement,”
IEEE Trans. Image Process., 10
(9), 1299
–1308
(2001). http://dx.doi.org/10.1109/83.941854 IIPRE4 1057-7149 Google Scholar
M. EladA. Feuer,
“Restoration of a single super resolution image from several blurred, noisy, and undersampled measured images,”
IEEE Trans. Image Process., 6
(12), 1646
–1658
(1997). http://dx.doi.org/10.1109/83.650118 IIPRE4 1057-7149 Google Scholar
W. T. FreemanT. R. JonesE. C. Pasztor,
“Example-based super-resolution,”
IEEE Comput. Graph. Appl., 22
(2), 56
–65
(2002). http://dx.doi.org/10.1109/38.988747 ICGADZ 0272-1716 Google Scholar
H. ChangD. Y. YeungY. Xiong,
“Super-resolution through neighbor embedding,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recogn.,
I-275
–I-282
(2004). Google Scholar
C. LiuH. ShumC. Zhang,
“A two-step approach to hallucinating faces: Global parametric model and local nonparametric model,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recogn.,
I-192
–I-198
(2001). Google Scholar
K. JiaS. G. Gong,
“Generalized face super-resolution,”
IEEE Trans. Image Process., 17
(6), 873
–886
(2008). http://dx.doi.org/10.1109/TIP.2008.922421 IIPRE4 1057-7149 Google Scholar
J. Yanget al.,
“Face hallucination via sparse coding,”
in Proc. IEEE Conf. Image Process.,
1264
–1267
(2008). Google Scholar
J. Yanget al.,
“Image super-resolution as sparse representation of raw image patches,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recogn.,
1
–8
(2008). Google Scholar
P. S. PenevL. Sirovich,
“The global dimensionality of face space,”
in Proc. IEEE Conf. Automatic Face and Gesture Recogn.,
264
–270
(2000). Google Scholar
Z. Wanget al.,
“Image quality assessment: From error visibility to structural similarity,”
IEEE Trans. Image Process., 13
(4), 600
–612
(2004). http://dx.doi.org/10.1109/TIP.2003.819861 IIPRE4 1057-7149 Google Scholar
Y. LuoR. K. Ward,
“Removing the blocking artifacts of block-based DCT compressed images,”
IEEE Trans. Image Process., 12
(7), 838
–842
(2003). http://dx.doi.org/10.1109/TIP.2003.814252 IIPRE4 1057-7149 Google Scholar
A. W. C. LiewH. Yan,
“Blocking artifacts suppression in block-coded images using overcomplete wavelet representation,”
IEEE Trans. Circuits Sys. Vid. Technol., 14
(4), 450
–461
(2004). http://dx.doi.org/10.1109/TCSVT.2004.825555 1051-8215 Google Scholar
S. SinghV. KuamrH. K. Verma,
“Reduction of blocking artifacts in JPEG compressed images,”
Dig. Sign. Process., 17
(1), 225
–243
(2007). http://dx.doi.org/10.1016/j.dsp.2005.08.003 DSPREJ 1051-2004 Google Scholar
B. JeonJ. Jeong,
“Blocking artifacts reduction in image compression with block boundary discontinuity criterion,”
IEEE Trans. Circuits Sys. Vid. Technol., 8
(3), 345
–357
(1999). http://dx.doi.org/10.1109/76.678634 1051-8215 Google Scholar
G. RajaM. J. Mirza,
“In-loop deblocking filter for H.264/AVC video,”
in Proc. Int. Sym. Commun., Control Sign. Process.,
(2006). Google Scholar
H. C. ReeveJ. S. Lim,
“Reduction of blocking effect in image coding,”
Opt. Eng., 23
(1), 34
–37
(1984). OPENEI 0892-354X Google Scholar
B. RamamurthiA. Gersho,
“Nonlinear space-variant postprocessing of block coded images,”
IEEE Trans. Acoustics, Speech, Sign. Process., 34
(5), 1258
–1268
(1986). http://dx.doi.org/10.1109/TASSP.1986.1164961 1520-6149 Google Scholar
P. Listet al.,
“Adaptive deblocking filter,”
IEEE Trans. Circuits Sys. Vid. Technol., 13
(7), 614
–619
(2003). http://dx.doi.org/10.1109/TCSVT.2003.815175 1051-8215 Google Scholar
Z. XiongM. OrchardY. Q. Zhang,
“A deblocking algorithm for JPEG compressed images using overcomplete wavelet representations,”
IEEE Trans. Circuits Sys. Vid. Technol., 7
(2), 433
–437
(1997). http://dx.doi.org/10.1109/76.564123 1051-8215 Google Scholar
R. E. RosenholtzA. Zakhor,
“Iterative procedures for reduction of blocking effects in transform image coding,”
Proc. SPIE, 1452 116
–126
(1991). http://dx.doi.org/10.1117/12.45376 PSISDG 0277-786X Google Scholar
F. AlterS. Y. DurandJ. Froment,
“Deblocking DCT-based compressed images with weighted total variation,”
in Proc. IEEE Conf. Acoustics, Speech, Sign. Process.,
221
–224
(2004). Google Scholar
Q. B. DoA. BeghdadiM. Luong,
“A new adaptive image post-treatment for deblocking and deringing based on total variation method,”
in Proc. ISSPA,
464
–467
(2010). Google Scholar
S. RothM. J. Black,
“Field of experts: Aa framework for learning image priors,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recogn.,
860
–867
(2005). Google Scholar
S. RothM. J. Black,
“Fields of experts,”
Int. J. Comput. Vis., 82
(2), 205
–229
(2009). http://dx.doi.org/10.1007/s11263-008-0197-6 IJCVEQ 0920-5691 Google Scholar
S. H. ParkD. S. Kim,
“Theory of projection onto the narrow quantization constraint set and its application,”
IEEE Trans. Image Process., 8
(10), 1361
–1373
(1999). http://dx.doi.org/10.1109/83.791962 IIPRE4 1057-7149 Google Scholar
C. Junget al.,
“Image deblocking via sparse representation,”
Sign. Process. Image Commun., 27
(6), 663
–677
(2012). http://dx.doi.org/10.1016/j.image.2012.03.002 SPICEF 0923-5965 Google Scholar
J. Wrightet al.,
“Robust face recognition via sparse representation,”
IEEE Trans. Pattern Anal. Mach. Intell., 31
(2), 210
–227
(2009). http://dx.doi.org/10.1109/TPAMI.2008.79 ITPIDJ 0162-8828 Google Scholar
K. HuangS. Aviyente,
“Sparse respresentation for signal classification,”
Adv. Neur. Info. Process. Sys., 19 609
–616
(2006). http://dx.doi.org/10.1.1.71.2963 1049-5258 Google Scholar
J. Yanget al.,
“Image super-resolution via sparse representation,”
IEEE Trans. Image Process., 19
(11), 2861
–2873
(2010). http://dx.doi.org/10.1109/TIP.2010.2050625 IIPRE4 1057-7149 Google Scholar
M. EladM. Aharon,
“Image denoising via sparse and redundant representations over learned dictionaries,”
IEEE Trans. Image Process., 15
(12), 3736
–3745
(2006). http://dx.doi.org/10.1109/TIP.2006.881969 IIPRE4 1057-7149 Google Scholar
M. AharonM. EladA. Bruckstein,
“The KSVD: An algorithm for designing overcomplete dictionaries for sparse representation,”
IEEE Trans. Sign. Process., 54
(11), 4311
–4322
(2006). http://dx.doi.org/10.1109/TSP.2006.881199 1053-587X Google Scholar
R. RubinsteinM. ZibulevskyM. Elad,
“Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit,”
CS Technion,
(2008). Google Scholar
J. M. D. CarvajalinoG. Sapiro,
“Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization,”
IEEE Trans. Image Process., 18
(7), 1395
–1408
(2009). http://dx.doi.org/10.1109/TIP.2009.2022459 IIPRE4 1057-7149 Google Scholar
M. AharonM. EladA. Bruckstein,
“K-SVD: Design of dictionaries for sparse representation,”
Proc. SPARSE, 9
–12
(2005). http://dx.doi.org/10.1.1.99.4103 Google Scholar
R. YangM. Ren,
“Learning overcomplete dictionaries with application to image Denoising,”
in Proc. Int. Sym. Photon. Optoelectron.,
1
–4
(2000). Google Scholar
C. H. Berkel,
“Multi-core for mobile phones,”
in Proc. Conf. Design, Automation and Test in Europe,
(2009). Google Scholar
H. FalakiR. GovindanD. Estrin,
“Smart screen management on mobile phones,”
Tech. Rep. Center Embedded Networked Sensing,
(2009). Google Scholar
H. KimI. C. Park,
“High-performance and low-power memory-interface architecture for video processing applications,”
IEEE Trans. Circuits Sys. Vid. Technol., 11
(11), 1160
–1170
(2011). http://dx.doi.org/10.1109/76.964782 1051-8215 Google Scholar
T. H. Menget al.,
“Low-power signal processing system design for wireless applications,”
IEEE Personal Commun., 5
(3), 20
–31
(1998). http://dx.doi.org/10.1109/98.683731 IPCME7 Google Scholar
T. C. Chenet al.,
“Fast algorithm and architecture design of low-power integer motion estimation for H.264/AVC,”
IEEE Trans. Circuits Sys. Vid. Technol., 17
(5), 568
–577
(2007). http://dx.doi.org/10.1109/TCSVT.2007.894044 1051-8215 Google Scholar
D. Linet al.,
“Parallelization of video processing,”
IEEE Sign. Process. Mag., 26
(6), 103
–112
(2009). http://dx.doi.org/10.1109/MSP.2009.934116 ISPRE6 1053-5888 Google Scholar
BiographyCheolkon Jung received the BS, MS, and PhD degrees in electronic engineering from Sungkyunkwan University, Republic of Korea, in 1995, 1997, and 2002, respectively. He is currently a professor at Xidian University, China. His main research interests include computer vision, pattern recognition, image and video processing, multimedia content analysis and management, and 3D TV. Licheng Jiao received the BS degree from Shanghai Jiao Tong University, China, in 1982, and the MS and PhD degrees from Xian Jiao Tong University, China, in 1984 and 1990, respectively. From 1990 to 1991, he was a postdoctoral fellow in the National Key Lab for Radar Signal Processing at Xidian University, China. Since 1992, he has been with the School of Electronic Engineering at Xidian University, China, where he is currently a distinguished professor. He is the dean of the School of Electronic Engineering and the Institute of Intelligent Information Processing at Xidian University, China. His current research interests include signal and image processing, nonlinear circuit and systems theory, learning theory and algorithms, computational vision, computational neuroscience, optimization problems, wavelet theory, and data mining. Bing Liu received the BS degree in electronic engineering from Henan Polytechnic University, China, in 2009. He is currently pursuing the MS degree in Xidian University, China. His research interests include image processing and machine learning. |