plenoptic.metric package

Submodules

plenoptic.metric.classes module

class plenoptic.metric.classes.NLP[source]

Bases: Module

simple class for implementing normalized laplacian pyramid

This class just calls plenoptic.metric.normalized_laplacian_pyramid on the image and returns a 3d tensor with the flattened activations.

NOTE: synthesis using this class will not be the exact same as synthesis using the plenoptic.metric.nlpd function (by default), because the synthesis methods use torch.norm(x - y, p=2) as the distance metric between representations, whereas nlpd uses the root-mean square of the distance (i.e., torch.sqrt(torch.mean(x-y)**2))

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(image)

returns flattened NLP activations

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the state_dict() method.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

forward(image)[source]

returns flattened NLP activations

WARNING: For now this only supports images with batch and channel size 1

Parameters:

image (torch.Tensor) – image to pass to normalized_laplacian_pyramid

Returns:

representation – 3d tensor with flattened NLP activations

Return type:

torch.Tensor

plenoptic.metric.model_metric module

plenoptic.metric.model_metric.model_metric(x, y, model)[source]

Calculate distance between x and y in model space root mean squared error

Parameters:
  • image (torch.Tensor) – image, (B x C x H x W)

  • model (torch class) – torch model with defined forward and backward operations

Notes

plenoptic.metric.naive module

plenoptic.metric.naive.mse(img1, img2)[source]

return the MSE between img1 and img2

Our baseline metric to compare two images is often mean-squared error, MSE. This is not a good approximation of the human visual system, but is handy to compare against.

For two images, \(x\) and \(y\), with \(n\) pixels each:

\[MSE &= \frac{1}{n}\sum_i=1^n (x_i - y_i)^2\]

The two images must have a float dtype

Parameters:
  • img1 (torch.Tensor) – The first image to compare

  • img2 (torch.Tensor) – The second image to compare, must be same size as img1

Returns:

mse – the mean-squared error between img1 and img2

Return type:

torch.float

plenoptic.metric.perceptual_distance module

plenoptic.metric.perceptual_distance.ms_ssim(img1, img2, power_factors=None)[source]

Multiscale structural similarity index (MS-SSIM)

As described in [1], multiscale structural similarity index (MS-SSIM) is an improvement upon structural similarity index (SSIM) that takes into account the perceptual distance between two images on different scales.

SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images, producing three maps instead of scalars. The SSIM map is the elementwise product of these three maps. See metric.ssim and metric.ssim_map for a full description of SSIM.

To get images of different scales, average pooling operations with kernel size 2 are performed recursively on the input images. The product of contrast map and structure map (the “contrast-structure map”) is computed for all but the coarsest scales, and the overall SSIM map is only computed for the coarsest scale. Their mean values are raised to exponents and multiplied to produce MS-SSIM:

\[MSSSIM = {SSIM}_M^{a_M} \prod_{i=1}^{M-1} ({CS}_i)^{a_i}\]

Here :math: M is the number of scales, :math: {CS}_i is the mean value of the contrast-structure map for the i’th finest scale, and :math: {SSIM}_M is the mean value of the SSIM map for the coarsest scale. If at least one of these terms are negative, the value of MS-SSIM is zero. The values of :math: a_i, i=1,…,M are taken from the argument power_factors.

Parameters:
  • img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.

  • img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).

  • power_factors (1D array, optional.) – power exponents for the mean values of maps, for different scales (from fine to coarse). The length of this array determines the number of scales. By default, this is set to [0.0448, 0.2856, 0.3001, 0.2363, 0.1333], which is what psychophysical experiments in [1] found.

Returns:

msssim – 2d tensor of shape (batch, channel) containing the MS-SSIM for each image

Return type:

torch.Tensor

References

[1] (1,2)

Wang, Zhou, Eero P. Simoncelli, and Alan C. Bovik. “Multiscale structural similarity for image quality assessment.” The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003. Vol. 2. IEEE, 2003.

plenoptic.metric.perceptual_distance.nlpd(img1, img2)[source]

Normalized Laplacian Pyramid Distance

As described in [1], this is an image quality metric based on the transformations associated with the early visual system: local luminance subtraction and local contrast gain control

A laplacian pyramid subtracts a local estimate of the mean luminance at six scales. Then a local gain control divides these centered coefficients by a weighted sum of absolute values in spatial neighborhood.

These weights parameters were optimized for redundancy reduction over an training database of (undistorted) natural images.

Note that we compute root mean squared error for each scale, and then average over these, effectively giving larger weight to the lower frequency coefficients (which are fewer in number, due to subsampling).

Parameters:
  • img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.

  • img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).

Returns:

distance – The normalized Laplacian Pyramid distance.

Return type:

torch.Tensor of shape (batch, channel)

References

[1]

Laparra, V., Ballé, J., Berardino, A. and Simoncelli, E.P., 2016. Perceptual image quality assessment using a normalized Laplacian pyramid. Electronic Imaging, 2016(16), pp.1-6.

plenoptic.metric.perceptual_distance.normalized_laplacian_pyramid(img)[source]

Compute the normalized Laplacian Pyramid using pre-optimized parameters

Parameters:

img (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. This representation is designed for grayscale images and will be computed separately for each channel (so channels are treated in the same way as batches).

Returns:

normalized_laplacian_activations – The normalized Laplacian Pyramid with six scales

Return type:

list of torch.Tensor

plenoptic.metric.perceptual_distance.ssim(img1, img2, weighted=False, pad=False)[source]

Structural similarity index

As described in [1], the structural similarity index (SSIM) is a perceptual distance metric, giving the distance between two images. SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images. See the references for more information.

This implementation follows the original implementation, as found at [2], as well as providing the option to use the weighted version used in [4] (which was shown to consistently improve the image quality prediction on the LIVE database).

Note that this is a similarity metric (not a distance), and so 1 means the two images are identical and 0 means they’re very different. When the two images are negatively correlated, SSIM can be negative. SSIM is bounded between -1 and 1.

This function returns the mean SSIM, a scalar-valued metric giving the average over the whole image. For the SSIM map (showing the computed value across the image), call ssim_map.

Parameters:
  • img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.

  • img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).

  • weighted (bool, optional) – whether to use the original, unweighted SSIM version (False) as used in [1] or the weighted version (True) as used in [4]. See Notes section for the weight

  • pad ({False, 'constant', 'reflect', 'replicate', 'circular'}, optional) – If not False, how to pad the image for the convolutions computing the local average of each image. See torch.nn.functional.pad for how these work.

Returns:

mssim – 2d tensor of shape (batch, channel) containing the mean SSIM for each image, averaged over the whole image

Return type:

torch.Tensor

Notes

The weight used when weighted=True is:

\[\log((1+\frac{\sigma_1^2}{C_2})(1+\frac{\sigma_2^2}{C_2}))\]

where \(sigma_1^2\) and \(sigma_2^2\) are the variances of img1 and img2, respectively, and \(C_2\) is a constant. See [4] for more details.

References

[1] (1,2)

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error measurement to structural similarity” IEEE Transactions on Image Processing, vol. 13, no. 1, Jan. 2004.

[4] (1,2,3)

Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual discriminability. Journal of Vision, 8(12), 1–13. http://dx.doi.org/10.1167/8.12.8

plenoptic.metric.perceptual_distance.ssim_map(img1, img2)[source]

Structural similarity index map

As described in [1], the structural similarity index (SSIM) is a perceptual distance metric, giving the distance between two images. SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images. See the references for more information.

This implementation follows the original implementation, as found at [2], as well as providing the option to use the weighted version used in [4] (which was shown to consistently improve the image quality prediction on the LIVE database).

Note that this is a similarity metric (not a distance), and so 1 means the two images are identical and 0 means they’re very different. When the two images are negatively correlated, SSIM can be negative. SSIM is bounded between -1 and 1.

This function returns the SSIM map, showing the SSIM values across the image. For the mean SSIM (a single value metric), call ssim.

Parameters:
  • img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.

  • img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).

  • weighted (bool, optional) – whether to use the original, unweighted SSIM version (False) as used in [1] or the weighted version (True) as used in [4]. See Notes section for the weight

Returns:

ssim_map – 4d tensor containing the map of SSIM values.

Return type:

torch.Tensor

References

[1] (1,2)

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error measurement to structural similarity” IEEE Transactions on Image Processing, vol. 13, no. 1, Jan. 2004.

[4] (1,2)

Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual discriminability. Journal of Vision, 8(12), 1–13. http://dx.doi.org/10.1167/8.12.8

Module contents