Image quality assessment from error visibility to structural similarity

IEEE Account

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests

Need Help?

US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support

About IEEE Xplore
Contact Us
Help
Accessibility
Terms of Use
Nondiscrimination Policy
Sitemap
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world’s largest technical professional organization dedicated to advancing technology for the benefit of humanity.
© Copyright 2023 IEEE — All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Источник

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.

Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

Citations

PDF

Open Access

More filters

21 Jul 2017

TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

…read moreread less

Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

…read moreread less

TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

…read moreread less

Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

…read moreread less

08 Oct 2016

TL;DR: In this paper, the authors combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image style transfer, where a feedforward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.

…read moreread less

Abstract: We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.

…read moreread less

Christian Ledig¹, Lucas Theis¹, Ferenc Huszar², Jose Caballero³, Andrew Cunningham, Alejandro Acosta², Andrew Peter Aitken², Alykhan Tejani², Johannes Totz², Zehan Wang², Wenzhe Shi² — Show less +7 more•Institutions (3)

21 Jul 2017

TL;DR: SRGAN as mentioned in this paper proposes a perceptual loss function which consists of an adversarial loss and a content loss, which pushes the solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.

…read moreread less

Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

…read moreread less

TL;DR: Zhang et al. as discussed by the authors proposed a deep learning method for single image super-resolution (SR), which directly learns an end-to-end mapping between the low/high-resolution images.

…read moreread less

Abstract: We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage. We explore different network structures and parameter settings to achieve trade-offs between performance and speed. Moreover, we extend our network to cope with three color channels simultaneously, and show better overall reconstruction quality.

…read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

TL;DR: The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods.

…read moreread less

Abstract: Embedded zerotree wavelet (EZW) coding, introduced by Shapiro (see IEEE Trans. Signal Processing, vol.41, no.12, p.3445, 1993), is a very effective and computationally simple technique for image compression. We offer an alternative explanation of the principles of its operation, so that the reasons for its excellent performance can be better understood. These principles are partial ordering by magnitude with a set partitioning sorting algorithm, ordered bit plane transmission, and exploitation of self-similarity across different scales of an image wavelet transform. Moreover, we present a new and different implementation based on set partitioning in hierarchical trees (SPIHT), which provides even better performance than our previously reported extension of EZW that surpassed the performance of the original EZW. The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods. In addition, the new coding and decoding procedures are extremely fast, and they can be made even faster, with only small loss in performance, by omitting entropy coding of the bit stream by the arithmetic code.

…read moreread less

Journal Article•DOI•

TL;DR: The embedded zerotree wavelet algorithm (EZW) is a simple, yet remarkably effective, image compression algorithm, having the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code.

…read moreread less

Abstract: The embedded zerotree wavelet algorithm (EZW) is a simple, yet remarkably effective, image compression algorithm, having the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code The embedded code represents a sequence of binary decisions that distinguish an image from the «null» image Using an embedded coding algorithm, an encoder can terminate the encoding at any point thereby allowing a target rate or target distortion metric to be met exactly Also, given a bit stream, the decoder can cease decoding at any point in the bit stream and still produce exactly the same image that would have been encoded at the bit rate corresponding to the truncated bit stream In addition to producing a fully embedded bit stream, the EZW consistently produces compression results that are competitive with virtually all known compression algorithms on standard test images Yet this performance is achieved with a technique that requires absolutely no training, no pre-stored tables or codebooks, and requires no prior knowledge of the image source The EZW algorithm is based on four key concepts: (1) a discrete wavelet transform or hierarchical subband decomposition, (2) prediction of the absence of significant information across scales by exploiting the self-similarity inherent in images, (3) entropy-coded successive-approximation quantization, and (4) universal lossless data compression which is achieved via adaptive arithmetic coding >

…read moreread less

Journal Article•DOI•

TL;DR: Although the new index is mathematically defined and no human visual system model is explicitly employed, experiments on various image distortion types indicate that it performs significantly better than the widely used distortion metric mean squared error.

…read moreread less

Abstract: We propose a new universal objective image quality index, which is easy to calculate and applicable to various image processing applications. Instead of using traditional error summation methods, the proposed index is designed by modeling any image distortion as a combination of three factors: loss of correlation, luminance distortion, and contrast distortion. Although the new index is mathematically defined and no human visual system model is explicitly employed, our experiments on various image distortion types indicate that it performs significantly better than the widely used distortion metric mean squared error. Demonstrative images and an efficient MATLAB implementation of the algorithm are available online at http://anchovy.ece.utexas.edu//spl sim/zwang/research/quality_index/demo.html.

…read moreread less

30 Nov 2001

TL;DR: This work has specific applications for those involved in the development of software and hardware solutions for multimedia, internet, and medical imaging applications.

…read moreread less

Abstract: This is nothing less than a totally essential reference for engineers and researchers in any field of work that involves the use of compressed imagery. Beginning with a thorough and up-to-date overview of the fundamentals of image compression, the authors move on to provide a complete description of the JPEG2000 standard. They then devote space to the implementation and exploitation of that standard. The final section describes other key image compression systems. This work has specific applications for those involved in the development of software and hardware solutions for multimedia, internet, and medical imaging applications.

…read moreread less

Journal Article•DOI•

TL;DR: Although some numerical measures correlate well with the observers’ response for a given compression technique, they are not reliable for an evaluation across different techniques, and a graphical measure called Hosaka plots can be used to appropriately specify not only the amount, but also the type of degradation in reconstructed images.

…read moreread less

Abstract: A number of quality measures are evaluated for gray scale image compression. They are all bivariate, exploiting the differences between corresponding pixels in the original and degraded images. It is shown that although some numerical measures correlate well with the observers’ response for a given compression technique, they are not reliable for an evaluation across different techniques. A graphical measure called Hosaka plots, however, can be used to appropriately specify not only the amount, but also the type of degradation in reconstructed images.

…read moreread less

Related Papers (5)

Источник

Presentation on theme: «Image Quality Assessment: From Error Visibility to Structural Similarity Zhou Wang.»— Presentation transcript:

1

Image Quality Assessment: From Error Visibility to Structural Similarity
Zhou Wang

2

Motivation original Image MSE=0, MSSIM=1 MSE=225, MSSIM=0.949

3

Perceptual Image Processing
Standard measure (MSE) does not agree with human visual perception Why? PERCEPTUAL IMAGE PROCESSING Define Perceptual IQA Measures Optimize IP Systems & Algorithms “Perceptually” Application Scope: essentially all IP applications image/video compression, restoration, enhancement, watermarking, displaying, printing …

4

Image Quality Assessment
Goal Automatically predict perceived image quality Classification Full-reference (FR); No-reference (NR); Reduced-reference (RR) Widely Used Methods FR: MSE and PSNR NR & RR: wide open research topic IQA is Difficult

5

VQEG (1) VQEG (video quality experts group)
1. Goal: recommend video quality assessment standards (TV, telecommunication, multimedia industries) 2. Hundreds of experts (Intel, Philips, Sarnoff, Tektronix, AT&T, NHK, NASA, Mitsubishi, NTIA, NIST, Nortel ……) Testing methodology 1. Provide test video sequences 2. Subjective evaluation 3. Objective evaluation by VQEG proponents 4. Compare subjective/objective results, find winner

6

VQEG (2) Current Status 1. Phase I test (2000):
Diverse types of distortions 10 proponents including PSNR no winner, 8~9 proponents statistically equivalent, including PSNR! 2. Phase II test (2003): Restricted types of distortions (MPEG) Result: A few models slightly better than PSNR 3. VQEG is extending their directions: FR/RR/NR, Low Bit Rate Multimedia: video, audio and speech …

7

Standard IQA Model: Error Visibility (1)
Philosophy distorted signal = reference signal + error signal Assume reference signal has perfect quality Quantify perceptual error visibility Representative work Pioneering work [Mannos & Sakrison ’74] Sarnoff model [Lubin ’93] Visible difference predictor [Daly ’93] Perceptual image distortion [Teo & Heeger ’94] DCT-based method [Watson ’93] Wavelet-based method [Safranek ’89, Watson et al. ’97]

8

Standard IQA Model: Error Visibility (2)
Motivation Simulate relevant early HVS components Key features Channel decomposition  linear frequency/orientation transforms Frequency weighting  contrast sensitivity function Masking  intra/inter channel interaction

9

Standard IQA Model: Error Visibility (3)
Quality definition problem Error visibility = quality ? The suprathreshold problem Based on threshold psychophysics Generalize to suprathreshold range? The natural image complexity problem Based on simple-pattern psychophysics Generalize to complex natural images? [Wang, et al., “Why is image quality assessment so difficult?” ICASSP ’02] [Wang, et al., IEEE Trans. Image Processing, ’04]

10

New Paradigm: Structural Similarity
Philosophy Purpose of human vision: extract structural information HVS is highly adapted for this purpose Estimate structural information change Classical philosophy New philosophy Bottom-up Top-down Predict Error Visibility Predict Structural Distortion How to define structural information? How to separate structural/nonstructural information?

11

Separation of Structural/nonstructural Distortion

12

Separation of Structural/nonstructural Distortion

13

Separation of Structural/nonstructural Distortion

14

Separation of Structural/nonstructural Distortion

15

Adaptive Linear System

16

Adaptive Linear System

17

Adaptive Linear System
= overcomplete, adaptive basis in the space of all images [Wang & Simoncelli, ICIP ’05, submitted]

18

Structural Similarity (SSIM) Index in Image Space
[Wang & Bovik, IEEE Signal Processing Letters, ’02] [Wang et al., IEEE Trans. Image Processing, ’04]

19

magnitude and component-weighted
Model Comparison Minkowski (MSE) component-weighted magnitude-weighted magnitude and component-weighted SSIM

20

JPEG2000 compressed image original image SSIM index map absolute error map

21

Gaussian noise corrupted image
original image SSIM index map absolute error map

22

JPEG compressed image original image SSIM index map absolute error map

23

Demo Images MSE=0, MSSIM=1 MSE=225, MSSIM=0.949 MSE=225, MSSIM=0.989

24

Validation LIVE Database
Dataset JP2(1) JP2(2) JPG(1) JPG(2) Noise Blur Error # of images 87 82 88 145 PSNR 0.934 0.895 0.902 0.914 0.987 0.774 0.881 SSIM 0.968 0.967 0.965 0.986 0.971 0.936 0.944 PSNR MSSIM

25

MAD Competition: MSE vs. SSIM (1)
[Wang & Simoncelli, Human Vision and Electronic Imaging, ’04]

26

MAD Competition: MSE vs. SSIM (2)
[Wang & Simoncelli, Human Vision and Electronic Imaging, ’04]

27

MAD Competition: MSE vs. SSIM (3)
[Wang & Simoncelli, Human Vision and Electronic Imaging, ’04]

28

MAD Competition: MSE vs. SSIM (4)
[Wang & Simoncelli, Human Vision and Electronic Imaging, ’04]

31

Extensions of SSIM (1) Color image quality assessment
Video quality assessment Multi-scale SSIM Complex wavelet SSIM [Toet & Lucassen., Displays, ’03] [Wang, et al., Signal Processing: Image Communication, ’04] [Wang, et al., Invited Paper, IEEE Asilomar Conf. ’03] [Wang & Simoncelli, ICASSP ’05]

32

[Wang & Simoncelli, ICASSP ’05]
Extensions of SSIM (2) Complex wavelet SSIM Motivation: robust to translation, rotation and scaling : complex wavelet coefficients in images x and y [Wang & Simoncelli, ICASSP ’05]

33

Image Matching without Registration
Standard patterns: 10 images Database: 2430 images Correct Recognition Rate: MSE: 59.6%; SSIM: 46.9%; Complex wavelet SSIM: 97.7% [Wang & Simoncelli, ICASSP ’05]

34

Using SSIM Image/video coding and communications
Web site: SSIM Paper: 11,000+ downloads; Matlab code: downloads Industrial implementation: Image/video coding and communications Image/video transmission, streaming & robustness [Kim & Kaveh ’02, Halbach & Olsen ’04, Lin et al. ’04, Leontaris & Reibman ’05] Image/video compression [Blanch et al. ’04, Dikici et al. ’04 , Ho et al. ‘03, Militzer et al. ’03] High dynamic range video coding [Mantiuk et al. ’04] Motion estimation/compensation [Monmarthe ’04] Biomedical image processing Microarray image processing for bioinformatics [Wang et al. ’03] Image fusion of CT and MRI images [Piella & Heijmans ’03, Piella ‘04] Molecular image processing [Ling et al. ’02] Medical image quality analysis [Chen et al. ’04]

35

Using SSIM (continued)
Watermarking/data hiding [Alattar ’03, Noore et al. ’04, Macq et al. ‘ Zhang & Wang ’05, Kumsawat et al. ‘04] Image denoising [Park & Lee ’04, Yang & Fox ’04 , Huang et al. ’ Roth & Black ’05, Hirakawa & Parks ’05] Image enhancement [Battiato et al. ’03] Image/video hashing [Coskun & Sankur ’04, Hsu & Lu ‘04] Image rendering [Bornik et al. ’03] Image fusion [Zheng et al. ’04, Tsai ’04, Gonzalez-Audicana et al. ’05] Texture reconstruction [Toth ’04] Image halftoning [Evans & Monga ’03, Neelamani ‘03] Radar imaging [Bentabet ’03] Infrared imaging [Torres ’03, Pezoa et al. ‘04] Ultrasound imaging [Loizou et al. ’04] Vision processor design [Cembrano et al., ’04] Wearable display design [von Waldkirch et al. ’04] Contrast equalization for LCD [Iranli et al. ’05] Airborne hyperspectral imaging [Christophe et al. ’05] Superresolution for remote sensing [Rubert et al. ’05]

36

THE END Thank you!

Источник

Download

image quality assessment from error visibility to structural similarity n.

Skip this Video

Loading SlideShow in 5 Seconds..

Image Quality Assessment: From Error Visibility to Structural Similarity PowerPoint Presentation

Download Presentation

Image Quality Assessment: From Error Visibility to Structural Similarity

— — — — — — — — — — — — — — — — — — — — — — — — — — — E N D — — — — — — — — — — — — — — — — — — — — — — — — — — —

Presentation Transcript

Image Quality Assessment: From Error Visibility to Structural Similarity Zhou Wang
Motivation original Image MSE=0, MSSIM=1 MSE=225, MSSIM=0.949 MSE=225, MSSIM=0.989 MSE=215, MSSIM=0.671 MSE=225, MSSIM=0.688 MSE=225, MSSIM=0.723
Perceptual Image Processing Standard measure (MSE) does not agree with human visual perception Why? PERCEPTUAL IMAGE PROCESSING Define Perceptual IQA Measures Optimize IP Systems & Algorithms “Perceptually” Application Scope: essentially all IP applications image/video compression, restoration, enhancement, watermarking, displaying, printing …
Image Quality Assessment • Goal • Automatically predict perceived image quality • Classification • Full-reference (FR); No-reference (NR); Reduced-reference (RR) • Widely Used Methods • FR: MSE and PSNR • NR & RR: wide open research topic • IQA is Difficult
VQEG (1) • VQEG (video quality experts group) • 1. Goal: recommend video quality assessment standards • (TV, telecommunication, multimedia industries) • 2. Hundreds of experts • (Intel, Philips, Sarnoff, Tektronix, AT&T, NHK, NASA, Mitsubishi, NTIA, NIST, Nortel ……) • Testing methodology • 1. Provide test video sequences • 2. Subjective evaluation • 3. Objective evaluation by VQEG proponents • 4. Compare subjective/objective results, find winner
VQEG (2) • Current Status • 1. Phase I test (2000): • Diverse types of distortions • 10 proponents including PSNR • no winner, 8~9 proponents statistically equivalent, including PSNR! • 2. Phase II test (2003): • Restricted types of distortions (MPEG) • Result: A few models slightly better than PSNR • 3. VQEG is extending their directions: • FR/RR/NR, Low Bit Rate • Multimedia: video, audio and speech …
Standard IQA Model: Error Visibility (1) Philosophy distorted signal = reference signal + error signal Assume reference signal has perfect quality Quantify perceptual error visibility • Representative work • Pioneering work [Mannos & Sakrison ’74] • Sarnoff model [Lubin ’93] • Visible difference predictor [Daly ’93] • Perceptual image distortion [Teo & Heeger ’94] • DCT-based method [Watson ’93] • Wavelet-based method [Safranek ’89, Watson et al. ’97]
Standard IQA Model: Error Visibility (2) • Motivation • Simulate relevant early HVS components • Key features • Channel decomposition linear frequency/orientation transforms • Frequency weighting  contrast sensitivity function • Masking  intra/inter channel interaction
Standard IQA Model: Error Visibility (3) • Quality definition problem • Error visibility = quality ? • The suprathreshold problem • Based on threshold psychophysics • Generalize to suprathreshold range? • The natural image complexity problem • Based on simple-pattern psychophysics • Generalize to complex natural images? [Wang, et al., “Why is image quality assessment so difficult?” ICASSP ’02] [Wang, et al., IEEE Trans. Image Processing, ’04]
New Paradigm: Structural Similarity Philosophy Purpose of human vision: extract structural information HVS is highly adapted for this purpose Estimate structural information change • How to define structural information? • How to separate structural/nonstructural information?
Separation of Structural/nonstructural Distortion
Separation of Structural/nonstructural Distortion
Separation of Structural/nonstructural Distortion
Separation of Structural/nonstructural Distortion
Adaptive Linear System
Adaptive Linear System
Adaptive Linear System = overcomplete, adaptive basis in the space of all images [Wang & Simoncelli, ICIP ’05, submitted]
Structural Similarity (SSIM) Index in Image Space [Wang & Bovik, IEEE Signal Processing Letters, ’02] [Wang et al., IEEE Trans. Image Processing, ’04]
Model Comparison Minkowski (MSE) component-weighted magnitude-weighted magnitude and component-weighted SSIM
JPEG2000 compressed image original image SSIM index map absolute error map
Gaussian noise corrupted image original image SSIM index map absolute error map
JPEG compressed image original image SSIM index map absolute error map
Demo Images MSE=0, MSSIM=1 MSE=225, MSSIM=0.949 MSE=225, MSSIM=0.989 MSE=215, MSSIM=0.671 MSE=225, MSSIM=0.688 MSE=225, MSSIM=0.723
Validation LIVE Database PSNR MSSIM
MAD Competition: MSE vs. SSIM (1) [Wang & Simoncelli, Human Vision and Electronic Imaging, ’04]
MAD Competition: MSE vs. SSIM (2) [Wang & Simoncelli, Human Vision and Electronic Imaging, ’04]
MAD Competition: MSE vs. SSIM (3) [Wang & Simoncelli, Human Vision and Electronic Imaging, ’04]
MAD Competition: MSE vs. SSIM (4) [Wang & Simoncelli, Human Vision and Electronic Imaging, ’04]
Extensions of SSIM (1) • Color image quality assessment • Video quality assessment • Multi-scale SSIM • Complex wavelet SSIM [Toet & Lucassen., Displays, ’03] [Wang, et al., Signal Processing: Image Communication, ’04] [Wang, et al., Invited Paper, IEEE Asilomar Conf. ’03] [Wang & Simoncelli, ICASSP ’05]
Extensions of SSIM (2) • Complex wavelet SSIM • Motivation: robust to translation, rotation and scaling : complex wavelet coefficients in images x and y [Wang & Simoncelli, ICASSP ’05]
Image Matching without Registration Standard patterns: 10 images Database: 2430 images Correct Recognition Rate: MSE: 59.6%; SSIM: 46.9%; Complex wavelet SSIM: 97.7% [Wang & Simoncelli, ICASSP ’05]
Using SSIM Web site:www.cns.nyu.edu/~lcv/ssim/ SSIM Paper: 11,000+ downloads; Matlab code: 2400+ downloads Industrial implementation:http://perso.wanadoo.fr/reservoir/ • Image/video coding and communications • Image/video transmission, streaming & robustness [Kim & Kaveh ’02, Halbach & Olsen ’04, Lin et al. ’04, Leontaris & Reibman ’05] • Image/video compression [Blanch et al. ’04, Dikici et al. ’04 , Ho et al. ‘03, Militzer et al. ’03] • High dynamic range video coding [Mantiuk et al. ’04] • Motion estimation/compensation [Monmarthe ’04] • Biomedical image processing • Microarray image processing for bioinformatics [Wang et al. ’03] • Image fusion of CT and MRI images [Piella & Heijmans ’03, Piella ‘04] • Molecular image processing [Ling et al. ’02] • Medical image quality analysis [Chen et al. ’04]
Using SSIM (continued) • Watermarking/data hiding [Alattar ’03, Noore et al. ’04, Macq et al. ‘04 Zhang & Wang ’05, Kumsawat et al. ‘04] • Image denoising [Park & Lee ’04, Yang & Fox ’04 , Huang et al. ’05 Roth & Black ’05, Hirakawa & Parks ’05] • Image enhancement [Battiato et al. ’03] • Image/video hashing [Coskun & Sankur ’04, Hsu & Lu ‘04] • Image rendering [Bornik et al. ’03] • Image fusion [Zheng et al. ’04, Tsai ’04, Gonzalez-Audicana et al. ’05] • Texture reconstruction [Toth ’04] • Image halftoning [Evans & Monga ’03, Neelamani ‘03] • Radar imaging [Bentabet ’03] • Infrared imaging [Torres ’03, Pezoa et al. ‘04] • Ultrasound imaging [Loizou et al. ’04] • Vision processor design [Cembrano et al., ’04] • Wearable display design [von Waldkirch et al. ’04] • Contrast equalization for LCD [Iranli et al. ’05] • Airborne hyperspectral imaging [Christophe et al. ’05] • Superresolution for remote sensing [Rubert et al. ’05]
THE ENDThank you!

Источник

From Wikipedia, the free encyclopedia

The structural similarity index measure (SSIM) is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. SSIM is used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-free image as reference.

SSIM is a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms. The difference with other techniques such as MSE or PSNR is that these approaches estimate absolute errors. Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close. These dependencies carry important information about the structure of the objects in the visual scene. Luminance masking is a phenomenon whereby image distortions (in this context) tend to be less visible in bright regions, while contrast masking is a phenomenon whereby distortions become less visible where there is significant activity or «texture» in the image.

History[edit]

The predecessor of SSIM was called Universal Quality Index (UQI), or Wang–Bovik Index, which was developed by Zhou Wang and Alan Bovik in 2001. This evolved, through their collaboration with Hamid Sheikh and Eero Simoncelli, into the current version of SSIM, which was published in April 2004 in the IEEE Transactions on Image Processing.^[1] In addition to defining the SSIM quality index, the paper provides a general context for developing and evaluating perceptual quality measures, including connections to human visual neurobiology and perception, and direct validation of the index against human subject ratings.

The basic model was developed in the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin and further developed jointly with the Laboratory for Computational Vision (LCV) at New York University. Further variants of the model have been developed in the Image and Visual Computing Laboratory at University of Waterloo and have been commercially marketed.

SSIM subsequently found strong adoption in the image processing community and in the television and social media industries. The 2004 SSIM paper has been cited over 40,000 times according to Google Scholar,^[2] making it one of the highest cited papers in the image processing and video engineering fields. It was recognized with the IEEE Signal Processing Society Best Paper Award for 2009.^[3] It also received the IEEE Signal Processing Society Sustained Impact Award for 2016, indicative of a paper having an unusually high impact for at least 10 years following its publication. Because of its high adoption by the television industry, the authors of the original SSIM paper were each accorded a Primetime Engineering Emmy Award in 2015 by the Television Academy.

Algorithm[edit]

The SSIM index is calculated on various windows of an image. The measure between two windows and of common size is:^[4]

${displaystyle {hbox{SSIM}}(x,y)={frac {(2mu _{x}mu _{y}+c_{1})(2sigma _{xy}+c_{2})}{(mu _{x}^{2}+mu _{y}^{2}+c_{1})(sigma _{x}^{2}+sigma _{y}^{2}+c_{2})}}}$

with:

Formula components[edit]

The SSIM formula is based on three comparison measurements between the samples of and : luminance (), contrast () and structure (). The individual comparison functions are:^[4]

${displaystyle l(x,y)={frac {2mu _{x}mu _{y}+c_{1}}{mu _{x}^{2}+mu _{y}^{2}+c_{1}}}}$

${displaystyle c(x,y)={frac {2sigma _{x}sigma _{y}+c_{2}}{sigma _{x}^{2}+sigma _{y}^{2}+c_{2}}}}$

${displaystyle s(x,y)={frac {sigma _{xy}+c_{3}}{sigma _{x}sigma _{y}+c_{3}}}}$

with, in addition to above definitions:

${displaystyle c_{3}=c_{2}/2}$

SSIM is then a weighted combination of those comparative measures:

${displaystyle {text{SSIM}}(x,y)=l(x,y)^{alpha }cdot c(x,y)^{beta }cdot s(x,y)^{gamma }}$

Setting the weights to 1, the formula can be reduced to the form shown above.

Mathematical Properties[edit]

SSIM satisfies the identity of indiscernibles, and symmetry properties, but not the triangle inequality or non-negativity, and thus is not a distance function. However, under certain conditions, SSIM may be converted to a normalized root MSE measure, which is a distance function.^[5] The square of such a function is not convex, but is locally convex and quasiconvex,^[5] making SSIM a feasible target for optimization.

Application of the formula[edit]

In order to evaluate the image quality, this formula is usually applied only on luma, although it may also be applied on color (e.g., RGB) values or chromatic (e.g. YCbCr) values. The resultant SSIM index is a decimal value between -1 and 1, where 1 indicates perfect similarity, 0 indicates no similarity, and -1 indicates perfect anti-correlation. For an image, it is typically calculated using a sliding Gaussian window of size 11×11 or a block window of size 8×8. The window can be displaced pixel-by-pixel on the image to create an SSIM quality map of the image. In the case of video quality assessment,^[6] the authors propose to use only a subgroup of the possible windows to reduce the complexity of the calculation.

Variants[edit]

Multi-Scale SSIM[edit]

A more advanced form of SSIM, called Multiscale SSIM (MS-SSIM)^[4] is conducted over multiple scales through a process of multiple stages of sub-sampling, reminiscent of multiscale processing in the early vision system. It has been shown to perform equally well or better than SSIM on different subjective image and video databases.^[4]^[7]^[8]

Multi-component SSIM[edit]

Three-component SSIM (3-SSIM) is a form of SSIM that takes into account the fact that the human eye can see differences more precisely on textured or edge regions than on smooth regions.^[9] The resulting metric is calculated as a weighted average of SSIM for three categories of regions: edges, textures, and smooth regions. The proposed weighting is 0.5 for edges, 0.25 for the textured and smooth regions. The authors mention that a 1/0/0 weighting (ignoring anything but edge distortions) leads to results that are closer to subjective ratings. This suggests that edge regions play a dominant role in image quality perception.

The authors of 3-SSIM has also extended model into four-component SSIM (4-SSIM). The edge types are further subdivided into preserved and changed edges by their distortion status. The proposed weighting is 0.25 for all four components.^[10]

Structural Dissimilarity[edit]

Structural dissimilarity (DSSIM) may be derived from SSIM, though it does not constitute a distance function as the triangle inequality is not necessarily satisfied.

${displaystyle {hbox{DSSIM}}(x,y)={frac {1-{hbox{SSIM}}(x,y)}{2}}}$

Video quality metrics and temporal variants[edit]

It is worth noting that the original version SSIM was designed to measure the quality of still images. It does not contain any parameters directly related to temporal effects of human perception and human judgment.^[7] A common practice is to calculate the average SSIM value over all frames in the video sequence. However, several temporal variants of SSIM have been developed.^[11]^[6]^[12]

Complex Wavelet SSIM[edit]

The complex wavelet transform variant of the SSIM (CW-SSIM) is designed to deal with issues of image scaling, translation and rotation. Instead of giving low scores to images with such conditions, the CW-SSIM takes advantage of the complex wavelet transform and therefore yields higher scores to said images. The CW-SSIM is defined as follows:

${displaystyle {text{CW-SSIM}}(c_{x},c_{y})={bigg (}{frac {2sum _{i=1}^{N}|c_{x,i}||c_{y,i}|+K}{sum _{i=1}^{N}|c_{x,i}|^{2}+sum _{i=1}^{N}|c_{y,i}|^{2}+K}}{bigg )}{bigg (}{frac {2|sum _{i=1}^{N}c_{x,i}c_{y,i}^{*}|+K}{2sum _{i=1}^{N}|c_{x,i}c_{y,i}^{*}|+K}}{bigg )}}$

Where $c_{x}$ is the complex wavelet transform of the signal and $c_{y}$ is the complex wavelet transform for the signal . Additionally, is a small positive number used for the purposes of function stability. Ideally, it should be zero. Like the SSIM, the CW-SSIM has a maximum value of 1. The maximum value of 1 indicates that the two signals are perfectly structurally similar while a value of 0 indicates no structural similarity.^[13]

SSIMPLUS[edit]

The SSIMPLUS index is based on SSIM and is a commercially available tool.^[14] It extends SSIM’s capabilities, mainly to target video applications. It provides scores in the range of 0–100, linearly matched to human subjective ratings. It also allows adapting the scores to the intended viewing device, comparing video across different resolutions and contents.

According to its authors, SSIMPLUS achieves higher accuracy and higher speed than other image and video quality metrics. However, no independent evaluation of SSIMPLUS has been performed, as the algorithm itself is not publicly available.

cSSIM[edit]

In order to further investigate the standard discrete SSIM from a theoretical perspective, the continuous SSIM (cSSIM)^[15] has been introduced and studied in the context of Radial basis function interpolation.

Other simple modifications[edit]

The r* cross-correlation metric is based on the variance metrics of SSIM. It’s defined as r*(x, y) = σ_xy/σ_xσ_y when σ_xσ_y ≠ 0, 1 when both standard deviations are zero, and 0 when only one is zero. It has found use in analyzing human response to contrast-detail phantoms.^[16]

SSIM has also been used on the gradient of images, making it «G-SSIM». G-SSIM is especially useful on blurred images.^[17]

The modifications above can be combined. For example, 4-G-r* is a combination of 4-SSIM, G-SSIM, and r*. It is able to reflect radiologist preference for images much better than other SSIM variants tested.^[18]

Application[edit]

SSIM has applications in a variety of different problems. Some examples are:

Image Compression: In lossy image compression, information is deliberately discarded to decrease the storage space of images and video. The MSE is typically used in such compression schemes. According to its authors, using SSIM instead of MSE is suggested to produce better results for the decompressed images.^[13]
Image Restoration: Image restoration focuses on solving the problem where is the blurry image that should be restored, is the blur kernel, is the additive noise and is the original image we wish to recover. The traditional filter which is used to solve this problem is the Wiener Filter. However, the Wiener filter design is based on the MSE. Using an SSIM variant, specifically Stat-SSIM, is claimed to produce better visual results, according to the algorithm’s authors.^[13]
Pattern Recognition: Since SSIM mimics aspects of human perception, it could be used for recognizing patterns. When faced with issues like image scaling, translation and rotation, the algorithm’s authors claim that it is better to use CW-SSIM,^[19] which is insensitive to these variations and may be directly applied by template matching without using any training sample. Since data-driven pattern recognition approaches may produce better performance when a large amount of data is available for training, the authors suggest using CW-SSIM in data-driven approaches.^[19]

Performance comparison[edit]

Due to its popularity, SSIM is often compared to other metrics, including more simple metrics such as MSE and PSNR, and other perceptual image and video quality metrics. SSIM has been repeatedly shown to significantly outperform MSE and its derivates in accuracy, including research by its own authors and others.^[7]^[20]^[21]^[22]^[23]^[24]

A paper by Dosselmann and Yang claims that the performance of SSIM is «much closer to that of the MSE» than usually assumed. While they do not dispute the advantage of SSIM over MSE, they state an analytical and functional dependency between the two metrics.^[8] According to their research, SSIM has been found to correlate as well as MSE-based methods on subjective databases other than the databases from SSIM’s creators. As an example, they cite Reibman and Poole, who found that MSE outperformed SSIM on a database containing packet-loss–impaired video.^[25] In another paper, an analytical link between PSNR and SSIM was identified.^[26]

References[edit]

^ Wang, Zhou; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. (2004-04-01). «Image quality assessment: from error visibility to structural similarity». IEEE Transactions on Image Processing. 13 (4): 600–612. Bibcode:2004ITIP…13..600W. CiteSeerX 10.1.1.2.5689. doi:10.1109/TIP.2003.819861. ISSN 1057-7149. PMID 15376593. S2CID 207761262.
^ «Google Scholar». scholar.google.com. Retrieved 2019-07-04.
^ «IEEE Signal Processing Society, Best Paper Award» (PDF).
^ ^a ^b ^c ^d Wang, Z.; Simoncelli, E.P.; Bovik, A.C. (2003-11-01). Multiscale structural similarity for image quality assessment. Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, 2004. Vol. 2. pp. 1398–1402 Vol.2. CiteSeerX 10.1.1.58.1939. doi:10.1109/ACSSC.2003.1292216. ISBN 978-0-7803-8104-9. S2CID 60600316.
^ ^a ^b Brunet, D.; Vass, J.; Vrscay, E. R.; Wang, Z. (April 2012). «On the mathematical properties of the structural similarity index» (PDF). IEEE Transactions on Image Processing. 21 (4): 2324–2328. Bibcode:2012ITIP…21.1488B. doi:10.1109/TIP.2011.2173206. PMID 22042163. S2CID 13739220.
^ ^a ^b Wang, Z.; Lu, L.; Bovik, A. C. (February 2004). «Video quality assessment based on structural distortion measurement». Signal Processing: Image Communication. 19 (2): 121–132. CiteSeerX 10.1.1.2.6330. doi:10.1016/S0923-5965(03)00076-6.
^ ^a ^b ^c Søgaard, Jacob; Krasula, Lukáš; Shahid, Muhammad; Temel, Dogancan; Brunnström, Kjell; Razaak, Manzoor (2016-02-14). «Applicability of Existing Objective Metrics of Perceptual Quality for Adaptive Video Streaming» (PDF). Electronic Imaging. 2016 (13): 1–7. doi:10.2352/issn.2470-1173.2016.13.iqsp-206. S2CID 26253431.
^ ^a ^b Dosselmann, Richard; Yang, Xue Dong (2009-11-06). «A comprehensive assessment of the structural similarity index». Signal, Image and Video Processing. 5 (1): 81–91. doi:10.1007/s11760-009-0144-1. ISSN 1863-1703. S2CID 30046880.
^ Li, Chaofeng; Bovik, Alan Conrad (2010-01-01). «Content-weighted video quality assessment using a three-component image model». Journal of Electronic Imaging. 19 (1): 011003–011003–9. Bibcode:2010JEI….19a1003L. doi:10.1117/1.3267087. ISSN 1017-9909.
^ Li, Chaofeng; Bovik, Alan C. (August 2010). «Content-partitioned structural similarity index for image quality assessment». Signal Processing: Image Communication. 25 (7): 517–526. doi:10.1016/j.image.2010.03.004.
^ «Redirect page». www.compression.ru.
^ Wang, Z.; Li, Q. (December 2007). «Video quality assessment using a statistical model of human visual speed perception» (PDF). Journal of the Optical Society of America A. 24 (12): B61–B69. Bibcode:2007JOSAA..24…61W. CiteSeerX 10.1.1.113.4177. doi:10.1364/JOSAA.24.000B61. PMID 18059915.
^ ^a ^b ^c Zhou Wang; Bovik, A.C. (January 2009). «Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures». IEEE Signal Processing Magazine. 26 (1): 98–117. Bibcode:2009ISPM…26…98W. doi:10.1109/msp.2008.930649. ISSN 1053-5888. S2CID 2492436.
^ Rehman, A.; Zeng, K.; Wang, Zhou (February 2015). Rogowitz, Bernice E; Pappas, Thrasyvoulos N; De Ridder, Huib (eds.). «Display device-adapted video quality-of-experience assessment» (PDF). IS&T-SPIE Electronic Imaging, Human Vision and Electronic Imaging XX. Human Vision and Electronic Imaging XX. 9394: 939406. Bibcode:2015SPIE.9394E..06R. doi:10.1117/12.2077917. S2CID 1466973.
^ Marchetti, F. (January 2021). «Convergence rate in terms of the continuous SSIM (cSSIM) index in RBF interpolation» (PDF). Dolom. Res. Notes Approx. 14: 27–32.
^ Prieto, Gabriel; Guibelalde, Eduardo; Chevalier, Margarita; Turrero, Agustín (21 July 2011). «Use of the cross-correlation component of the multiscale structural similarity metric (R* metric) for the evaluation of medical images: R* metric for the evaluation of medical images». Medical Physics. 38 (8): 4512–4517. doi:10.1118/1.3605634. PMID 21928621.
^ Chen, Guan-hao; Yang, Chun-ling; Xie, Sheng-li (October 2006). «Gradient-Based Structural Similarity for Image Quality Assessment». 2006 International Conference on Image Processing: 2929–2932. doi:10.1109/ICIP.2006.313132. ISBN 1-4244-0480-0. S2CID 15809337.
^ Renieblas, Gabriel Prieto; Nogués, Agustín Turrero; González, Alberto Muñoz; Gómez-Leon, Nieves; del Castillo, Eduardo Guibelalde (26 July 2017). «Structural similarity index family for image quality assessment in radiological images». Journal of Medical Imaging. 4 (3): 035501. doi:10.1117/1.JMI.4.3.035501. PMC 5527267. PMID 28924574.
^ ^a ^b Gao, Y.; Rehman, A.; Wang, Z. (September 2011). CW-SSIM based image classification (PDF). IEEE International Conference on Image Processing (ICIP11).
^ Zhang, Lin; Zhang, Lei; Mou, X.; Zhang, D. (September 2012). A comprehensive evaluation of full reference image quality assessment algorithms. 2012 19th IEEE International Conference on Image Processing. pp. 1477–1480. CiteSeerX 10.1.1.476.2566. doi:10.1109/icip.2012.6467150. ISBN 978-1-4673-2533-2. S2CID 10716320.
^ Zhou Wang; Wang, Zhou; Li, Qiang (May 2011). «Information Content Weighting for Perceptual Image Quality Assessment». IEEE Transactions on Image Processing. 20 (5): 1185–1198. Bibcode:2011ITIP…20.1185W. doi:10.1109/tip.2010.2092435. PMID 21078577. S2CID 106021.
^ Channappayya, S. S.; Bovik, A. C.; Caramanis, C.; Heath, R. W. (March 2008). SSIM-optimal linear image restoration. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 765–768. CiteSeerX 10.1.1.152.7952. doi:10.1109/icassp.2008.4517722. ISBN 978-1-4244-1483-3. S2CID 14830268.
^ Gore, Akshay; Gupta, Savita (2015-02-01). «Full reference image quality metrics for JPEG compressed images». AEU — International Journal of Electronics and Communications. 69 (2): 604–608. doi:10.1016/j.aeue.2014.09.002.
^ Wang, Z.; Simoncelli, E. P. (September 2008). «Maximum differentiation (MAD) competition: a methodology for comparing computational models of perceptual quantities» (PDF). Journal of Vision. 8 (12): 8.1–13. doi:10.1167/8.12.8. PMC 4143340. PMID 18831621.
^ Reibman, A. R.; Poole, D. (September 2007). Characterizing packet-loss impairments in compressed video. 2007 IEEE International Conference on Image Processing. Vol. 5. pp. V – 77–V – 80. CiteSeerX 10.1.1.159.5710. doi:10.1109/icip.2007.4379769. ISBN 978-1-4244-1436-9. S2CID 1685021.
^ Hore, A.; Ziou, D. (August 2010). Image Quality Metrics: PSNR vs. SSIM. 2010 20th International Conference on Pattern Recognition. pp. 2366–2369. doi:10.1109/icpr.2010.579. ISBN 978-1-4244-7542-1. S2CID 9506273.

External links[edit]

Home page
Rust Implementation
C/C++ Implementation
DSSIM C++ Implementation
Chris Lomont’s C# Implementation
qpsnr implementation (multi threaded C++)
Implementation in VQMT software
Implementation in Python
«Mystery Behind Similarity Measures MSE and SSIM», Gintautas Palubinskas, 2014

Источник

IEEE Account

Purchase Details

Profile Information

Need Help?

Citations

References

Related Papers (5)

Presentation on theme: «Image Quality Assessment: From Error Visibility to Structural Similarity Zhou Wang.»— Presentation transcript:

Image Quality Assessment: From Error Visibility to Structural Similarity

Presentation Transcript

History[edit]

Algorithm[edit]

Formula components[edit]

Mathematical Properties[edit]

Application of the formula[edit]

Variants[edit]

Multi-Scale SSIM[edit]

Multi-component SSIM[edit]

Structural Dissimilarity[edit]

Video quality metrics and temporal variants[edit]

Complex Wavelet SSIM[edit]

SSIMPLUS[edit]

cSSIM[edit]

Other simple modifications[edit]

Application[edit]

Performance comparison[edit]

See also[edit]

References[edit]

External links[edit]

Читайте также: