Deep Convolutional Neural Models for Image Quality Prediction

Introduction: The vital role played by images in human life is manifested by the proverb “A picture is worth thousand words”. The pipelines from picture content generation to consumption are fraught with numerous sources of distortions. Storage and the transmission bandwidth constraints result in induced degradation because of the demand for different compression techniques that reduce storage requirement. Transmission errors and packet losses during communication are other sources that contribute to image distortions. Image processing algorithms used for adopting the changes in resolution, format and color are few more forms of image degradations.

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

Humans can judge image quality almost as a reflex action, but it is impractical to interleave human judgment of image quality as a part of information systems design. Machine evaluation of image quality has been realized as an important area of research in the light of this. Automatic Quality assessment algorithms can be used for optimization purposes, where one maximizes quality at a given cost, for comparative analysis between different alternatives and to benchmark image processing systems and algorithms.

Existing Work:

Picture-quality models:Picture-quality models are generally classified according to whether a pristine reference image is available for comparison. Full-reference and reduced-reference models assume that a reference is available; otherwise, the model is no-reference, or blind. Reference models are generally deployed when a process is applied to an original image, such as compression or enhancement. No-reference models are applied when the quality of an original image is in question, as in a source inspection process, or when analyzing the image. Generally, no reference prediction is a more difficult problem. No-reference picture-quality models rely heavily on regular models of natural picture statistics [1].

Deep learning and CNNs: Deep learning made breakthrough impact on such difficult problems as speech recognition and image classification, achieving improvements in performance that are significantly superior to those obtained using conventional model-based methods. One of the principal advantages of deep-learning models is the remarkable generalization capabilities that they can acquire when they are trained on large-scale labeled data sets. deep-learning models employ multiple levels of linear and nonlinear transformations to generate highly general data representations [2]. Open-source frameworks such as TensorFlow [3] have also greatly increased the accessibility of deep-learning models, and their application to diverse image processing and analysis problems has greatly expanded.

A common conception is that CNNs resemble processing by neurons in visual cortex. This idea largely arises from the observation that, in deep convolutional networks deploying many layers of adaptation on images, early layers of processing often resemble the profiles of low-level cortical neurons in visual area V1, i.e., directionally tuned Gabor filters [4], or neurons in visual area V2 implicated in assembling low-level representations of image structure [5]. At early layers of network abstraction, these perceptual attributes make them appealing tools for adaption to the picture-quality prediction problem.

Datasets:The performance of deep-learning models generally depends heavily on the size of the available training data set(s). Currently available legacy, public-domain, subjective picture-quality databases include LIVE IQA [6], TID2013 [7] are relatively small. LIVE IQA contains 29 diverse natural images distorted using five different image distortion types that could occur in real-world applications. The judgments from the subjects are processed and are converted to Difference Mean Opinion Score for each distorted image. The LIVE “In the Wild” Challenge Database [8] with nearly 1,200 unique pictures, each afflicted by a unique, unknown combination of highly diverse authentic distortions and judged by more than 350,000 unique human subjects) is of moderate size. Image recognition data sets such as ImageNet [9] contain tens of millions of labeled images.

Common strategies for overcoming this labeled image paucity are data augmentation techniques, which seek to multiply the effective volume of image data via rotations, cropping, and reflections. In another common strategy, the images used for training are divided into many small patches. However, the scores that subjects would apply to a local image patch will generally differ greatly from those applied to the entire image. While generating a large amount of picture content is simple, ensuring adequate distortion diversity and realism is much harder.

CNN-based no-reference Image quaity models: Several CNN-based picture-quality prediction models have attempted to use patch-based labeling to increase the set of informative (ground-truth) training samples. Generally, two types of training approaches have been used: patchwise and imagewise, as depicted in Figure 1. In the former, each image patch is independently regressed onto its target. In the latter, the patch features or predicted scores are aggregated or pooled, then regressed onto a single ground-truth subjective score.

The first application of a spatial CNN model to the picture quality prediction problem was reported in [11], wherein a high-dimensional input image was directly fed into a shallow CNN model without finding handcrafted features. To obtain more data, each input image was subdivided into small patches as a method of data augmentation, each being assigned the same subjective-quality score during training. Patchwise training was used, and, during application, the predicted patch scores were averaged.

Li et al. utilized a deep CNN model that was pretrained on the ImageNet data set [12]. A network-in-network (NiN) structure was used to enhance the abstraction ability of the model. The final layer of the pretrained model was replaced by regression layers, which mapped the learned features onto subjective scores. Image patches were regressed onto identical subjective-quality scores during training.

Figure 1. Patchwise and imagewise strategies used to train patch-based picture-quality prediction models [10].

Bosse et al. deployed a deeper, 12-layer CNN model fed only by raw RGB image patches to learn a no-reference picture- quality model [13]. They proposed two training strategies: patchwise training and weighted average patch aggregation, whereby the relative importance of each patch was weighted by training on a subnetwork. The overall loss function was optimized in an end-to-end manner. The authors reported state-of-the-art prediction accuracies on the major synthetic distortion picture-quality databases.

To overcome overfitting problems that can arise from a lack of adequate local ground-truth scores, several authors have suggested training deep CNN models in two separate stages: a pretraining stage, using a large number of algorithm-generated proxy ground-truth quality scores, followed by a stage of regression onto a smaller set of subjective scores. For example, [14] describes a two-stage CNN-based no-reference-quality prediction model. The model attains highly competitive prediction accuracy on the legacy data sets.

Proposed work: The proposed work aims at designing and implementing a no-reference model for image quality prediction using Deep Convolutional Neural Networks. The works aims at analyzing and comparing the existing methods in terms of the following critical aspects.

Strategies to overcome the paucity of large labeled training datasets.

Architecture of the deep CNN to be used.

The number of stages in training the CNN

Aggregation and pooling techniques for better prediction.

References

[1] A. C. Bovik, “Automatic prediction of perceptual image and video quality,” Proc. IEEE, vol. 101, no. 9, pp. 2008–2024, 2013.

[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Advances in Neural Information Processing Systems Conf. 2012, pp. 1097–1105.

[3] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, et al., “TensorFlow: Large-scale machine learning on heterogeneous systems.” [Online]. Available: https://www.tensorflow.org/

[4] M. Clark and A. C. Bovik, “Experiments in segmenting texton patterns using localized spatial filters,” Pattern Recognit., vol. 22, no. 6, pp. 707–717, 1989.

[5] H. Lee, C. Ekanadham, and A. Y. Ng, “Sparse deep belief net model for visual area V2,” in Proc. Advances in Neural Information Processing Systems Conf., 2008, pp. 873–880.

[6] H. Sheikh, M. Sabir, and A. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Trans. Image Process., vol. 15, no. 11, pp. 3440–3451, 2006.

[7] N. Ponomarenko, L. Jin, O. Ieremeiev, V. Lukin, K. Egiazarian, J. Astola, B. Vozel, K. Chehdi, et al., “Image database TID2013: Peculiarities, results and perspectives,” Signal Process. Image Commun., vol. 30, pp. 57–77, Jan. 2015.

[8] D. Ghadiyaram and A. C. Bovik, “Massive online crowdsourced study of subjective and objective picture quality,” IEEE Trans. Image Process., vol. 25, no. 1, pp. 372–387, 2016.

[9] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A largescale hierarchical image database,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009, pp. 248–255.

[10] J. Kim, H. Zeng, D. Ghadiyaram, S. Lee, L. Zhang, A. C. Bovik, “Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment” in IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 130-141, 2017.

[11] L. Kang, P. Ye, Y. Li, and D. Doermann, “Convolutional neural networks for noreference image quality assessment,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2014, pp. 1733–1740.

[12] Y. Li, L. M. Po, L. Feng, and F. Yuan, “No-reference image quality assessment with deep convolutional neural networks,” in Proc. IEEE Int. Conf. Digital Signal Processing, 2016, pp. 685–689.

[13] S. Bosse, D. Maniry, T. Wiegand, and W. Samek, “A deep neural network for image quality assessment,” in Proc. IEEE Int. Conf. Image Processing, 2016, pp. 3773–3777.

[14] J. Kim and S. Lee, “Fully deep blind image quality predictor,” IEEE J. Sel. Topics Signal Process., vol. 11, no. 1, pp. 206–220, 2017.

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Deep Convolutional Neural Models for Image Quality Prediction ”

Get high-quality paper

NEW! AI matching with writer