plenoptic

pypi-shield license-shield python-version-shield build zenodo codecov binder

plenoptic logo

plenoptic is a python library for model-based synthesis of perceptual stimuli. For plenoptic, models are those of visual [1] information processing: they accept an image as input, perform some computations, and return some output, which can be mapped to neuronal firing rate, fMRI BOLD response, behavior on some task, image category, etc. The intended audience is researchers in neuroscience, psychology, and machine learning. The generated stimuli enable interpretation of model properties through examination of features that are enhanced, suppressed, or discarded. More importantly, they can facilitate the scientific process, through use in further perceptual or neural experiments aimed at validating or falsifying model predictions.

Getting started

Installation

The best way to install plenoptic is via pip:

$ pip install plenoptic

See the Installation page for more details, including how to set up an isolated virtual environment (recommended).

ffmpeg and videos

Some methods in this package generate videos. There are several backends available for saving the animations to file (see matplotlib documentation ). To convert them to HTML5 for viewing (for example, in a jupyter notebook), you’ll need ffmpeg installed. Depending on your system, this might already be installed, but if not, the easiest way is probably through conda: conda install -c conda-forge ffmpeg. To change the backend, run matplotlib.rcParams['animation.writer'] = writer before calling any of the animate functions. If you try to set that rcParam with a random string, matplotlib will list the available choices.

Contents

The four synthesis methods included in plenoptic

Synthesis methods

  • Metamers: given a model and a reference image, stochastically generate a new image whose model representation is identical to that of the reference image (a “metamer”, as originally defined in the literature on Trichromacy). This method makes explicit those features that the model retains/discards.

  • Eigendistortions: given a model and a reference image, compute the image perturbations that produce the smallest/largest change in the model response space. These are the image changes to which the model is least/most sensitive, respectively.

  • Maximal differentiation (MAD) competition: given a reference image and two models that measure distance between images, generate pairs of images that optimally differentiate the models. Specifically, synthesize a pair of images that are equi-distant from the reference image according to model-1, but maximally/minimally distant according to model-2. Synthesize a second pair with the roles of the two models reversed. This method allows for efficient comparison of two metrics, highlighting the aspects in which their sensitivities most differ.

  • Geodesics: given a model and two images, synthesize a sequence of images that lie on the shortest (“geodesic”) path in the model’s representation space. This method allows examination of the larger-scale geometric properties of model representation (as opposed to the local properties captured by the eigendistortions).

Models, Metrics, and Model Components

  • Steerable pyramid, [Simoncelli1992] and [Simoncelli1995], a multi-scale oriented image decomposition. Images are decomposed with a family of oriented filters, localized in space and frequency, similar to the “Gabor functions” commonly used to model receptive fields in primary visual cortex. The critical difference is that the pyramid organizes these filters so as to effeciently cover the 4D space of (x,y) positions, orientations, and scales, enabling efficient interpolation and interpretation (further info ). See the pyrtools documentation for more details on python tools for image pyramids in general and the steerable pyramid in particular.

  • Portilla-Simoncelli texture model, [Portilla2000], which computes a set of image statistics that capture the appearance of visual textures (further info).

  • Structural Similarity Index (SSIM), [Wang2004], is a perceptual similarity metric, that takes two images and returns a value between -1 (totally different) and 1 (identical) reflecting their similarity (further info).

  • Multiscale Structural Similarity Index (MS-SSIM), [Wang2003], is an extension of SSIM that operates jointly over multiple scales.

  • Normalized Laplacian distance, [Laparra2016] and [Laparra2017], is a perceptual distance metric based on transformations associated with the early visual system: local luminance subtraction and local contrast gain control, at six scales (further info).

Getting help

We communicate via several channels on Github:

  • To report a bug, open an issue.

  • To send suggestions for extensions or enhancements, please post in the ideas section of discussions first. We’ll discuss it there and, if we decide to pursue it, open an issue to track progress.

  • To ask usage questions, discuss broad issues, or show off what you’ve made with plenoptic, go to Discussions.

  • To contribute to the project, see the contributing guide.

In all cases, we request that you respect our code of conduct.

Citing us

If you use plenoptic in a published academic article or presentation, please cite us! See the Citation Guide for more details.

Installation

plenoptic should work on Windows, Linux, or Mac. If you have a problem with installation, please open a bug report!

The easiest way to install plenoptic is from PyPI (the Python Package Index) using pip within a new virtual environment. The instructions on this page use conda, which we recommend if you are unfamiliar with python environment management, but other virtual environment systems should work. If you wish to follow these instructions and do not have conda installed on your machine, I recommend starting with miniconda.:

$ conda create --name plenoptic pip python=3.9
$ conda activate plenoptic
$ pip install plenoptic

Our dependencies include pytorch and pyrtools. Installation should take care of them (along with our other dependencies) automatically, but if you have an installation problem (especially on a non-Linux operating system), it is likely that the problem lies with one of those packages. Open an issue and we’ll try to help you figure out the problem!

You can also install it directly from source to have a local editable copy. This is most useful for developing (for more info, see our contributing guide) or if you want to use the most cutting-edge version:

$ conda create --name plenoptic pip python=3.9
$ conda activate plenoptic
$ # clone the repository
$ git clone https://github.com/LabForComputationalVision/plenoptic.git
$ cd plenoptic
$ # install in editable mode with `-e` or, equivalently, `--editable`
$ pip install -e .

With an editable copy, any changes locally will be automatically reflected in your installation (under the hood, this command uses symlinks).

Attention

To install plenoptic in editable mode, you need pip >= 21.3 (see pip’s changelog). If you run into an error after running the pip install -e . command, try updating your pip version with pip install --upgrade pip.

Optional dependencies

The above instructions will install plenoptic and its core dependencies. You may also wish to install some additional optional dependencies. These dependencies are specified using square brackets during the pip install command and can be installed for either a local, editable install or one directly from PyPI:

  • If you would like to run the jupyter notebooks locally: pip install plenoptic[nb] or pip install -e .[nb]. This includes pooch (for downloading some extra data) torchvision (which has some models we’d like to use), jupyter, and related libraries. See the jupyter section for more details on how to handle jupyter and python virtual environments. Note that you can run our notebooks in the cloud using Binder, no installation required!

  • If you would like to locally build the documentation: pip install -e .[docs]. This includes sphinx and related libraries. (This probably only makes sense if you have a local installation.)

  • If you would like to run the tests: pip install -e .[dev]. This includes pytest and related libraries. (This probably only makes sense if you have a local installation.)

These optional dependencies can be joined with a comma: pip install -e .[docs,dev]

ffmpeg and videos

Several methods in this package generate videos. There are several backends possible for saving the animations to file, see matplotlib documentation for more details. In order to convert them to HTML5 for viewing (and thus, to view in a jupyter notebook), you’ll need ffmpeg installed and on your path as well. Depending on your system, this might already be installed, but if not, the easiest way is probably through conda: conda install -c conda-forge ffmpeg.

To change the backend, run matplotlib.rcParams['animation.writer'] = writer before calling any of the animate functions. If you try to set that rcParam with a random string, matplotlib will tell you the available choices.

Running notebooks locally

Tip

You can run the notebooks in the cloud using Binder, no installation required!

Installing jupyter and setting up the kernel

If you wish to locally run the notebooks, you will need to install jupyter, ipywidgets, and (for some of the notebooks) torchvision and pooch . There are three possible ways of getting a local jupyter install working with this package, depending on how you wish to handle virtual environments.

Hint

If plenoptic is the only environment that you want to run notebooks from and/or you are unfamiliar with virtual environments, go with option 1 below.

  1. Install jupyter in the same environment as plenoptic. This is the easiest but, if you have multiple virtual environments and want to use Jupyter notebooks in each of them, it will take up a lot of space. If you followed the instructions above to create a conda environment named plenoptic, do the following:

    $ conda activate plenoptic
    $ conda install -c conda-forge jupyterlab ipywidgets torchvision pooch
    

    With this setup, when you have another virtual environment that you wish to run jupyter notebooks from, you must reinstall jupyuter into that separate virtual environment, which is wasteful.

  2. Install jupyter in your base environment and use nb_conda_kernels to automatically manage kernels in all your conda environments. This is a bit more complicated, but means you only have one installation of jupyter lab on your machine. Again, if you followed the instructions to create a conda environment named plenoptic:

    $ # activate your 'base' environment, the default one created by conda/miniconda
    $ conda activate base
    $ # install jupyter lab and nb_conda_kernels in your base environment
    $ conda install -c conda-forge jupyterlab ipywidgets
    $ conda install nb_conda_kernels
    $ # install ipykernel, torchvision, and pooch in the plenoptic environment
    $ conda install -n plenoptic ipykernel torchvision pooch
    

    With this setup, you have a single jupyter install that can run kernels from any of your conda environments. All you have to do is install ipykernel (and restart jupyter) and you should see the new kernel!

    Attention

    This method only works with conda environments. If you are using another method to manage your python virtual environments, you’ll have to use one of the other methods.

  3. Install jupyter in your base environment and manually install the kernel in your virtual environment. This requires only a single jupyter install and is the most general solution (it will work with conda or any other way of managing virtual environments), but requires you to be a bit more comfortable with handling environments. Again, if you followed the instructions to create a conda environment named plenoptic:

    $ # activate your 'base' environment, the default one created by conda/miniconda
    $ conda activate base
    $ # install jupyter lab and nb_conda_kernels in your base environment
    $ conda install -c conda-forge jupyterlab ipywidgets
    $ # install ipykernel and torchvision in the plenoptic environment
    $ conda install -n plenoptic ipykernel torchvision pooch
    $ conda activate plenoptic
    $ python -m ipykernel install --prefix=/path/to/jupyter/env --name 'plenoptic'
    

    /path/to/jupyter/env is the path to your base conda environment, and depends on the options set during your initial installation. It’s probably something like ~/conda or ~/miniconda. See the ipython docs for more details.

    With this setup, similar to option 2, you have a single jupyter install that can run kernels from any virtual environment. The main difference is that it can run kernels from any virtual environment (not just conda!) and have fewer packages installed in your base environment, but that you have to run an additional line after installing ipykernel into the environment (python -m ipykernel install ...).

    Note

    If you’re not using conda to manage your environments, the key idea is to install jupyter and ipywidgets in one environment, then install ipykernel, torchvision, and pooch in the same environment as plenoptic, and then run the ipykernel install command using the plenoptic environment’s python.

The following table summarizes the advantages and disadvantages of these three choices:

Method

Advantages

Disadvantages

  1. Everything in one environment

✅ Simple

❌ Requires lots of hard drive space

  1. nb_conda_kernels

✅ Set up once

❌ Initial setup more complicated

✅ Requires only one jupyter installation

✅ Automatically finds new environments with ipykernel installed

  1. Manual kernel installation

✅ Flexible: works with any virtual environment setup

❌ More complicated

✅ Requires only one jupyter installation

❌ Extra step for each new environment

You can install all of the extra required packages using pip install -e .[nb] (if you have a local copy of the source code) or pip install plenoptic[nb] (if you are installing from PyPI). This includes jupyter, and so is equivalent to method 1 above. See the optional dependencies section for more details.

Running the notebooks

Once you have jupyter installed and the kernel set up, navigate to plenoptic’s examples/ directory on your terminal and activate the environment you installed jupyter into (conda activate plenoptic for method 1, conda activate base for methods methods method 2 or 3), then run jupyter and open up the notebooks. If you followed the second or third method, you should be prompted to select your kernel the first time you open a notebook: select the one named “plenoptic”.

Attention

If you installed plenoptic from PyPI, then you will not have the notebooks on your machine and will need to download them directly from our GitHub repo. If you have a local install (and thus ran git clone), then the notebooks can be found in the examples/ directory.

Conceptual Introduction

plenoptic is a python library for “model-based synthesis of perceptual stimuli”. If you’ve never heard this phrase before, it may seem mysterious: what is stimulus synthesis and what types of scientific investigation does it facilitate?

Synthesis is a framework for exploring models by using them to create new stimuli, rather than examining their responses to existing ones. plenoptic focuses on models of visual [1] information processing, which take an image as input, perform some computations based on parameters, and return some vector-valued abstract representation as output. This output can be mapped to neuronal firing rate, fMRI BOLD response, behavior on some task, image category, etc., depending on the researchers’ intended question.

Schematic describing relationship between simulate, fit, and synthesize.

Schematic describing relationship between simulate, fit, and synthesize.

That is, computational models transform a stimulus \(s\) to a response \(r\) (we often refer to \(r\) as “the model’s representation of \(s\)”), based on some model parameters \(\theta\). For example, a trained neural network that classifies images has specific weights \(\theta\), accepts an image \(s\) and returns a one-hot vector \(r\) that specifies the image class. Another example is a linear-nonlinear oriented filter model of a simple cell in primary visual cortex, where \(\theta\) defines the filter’s orientation, size, and spatial frequency, the model accepts an image \(s\) and returns a scalar \(r\) that represents the neuron’s firing rate.

The most common scientific uses for a model are to simulate responses or to fit parameters, as illustrated in Fig. 1. For simulation, we hold the parameters constant while presenting the model with inputs (e.g, photographs of dogs, or a set of sine-wave gratings) and we run the model to compute responses. For fitting, we use optimization to find the parameter values that best account for the observed responses to a set of training stimuli. In both of these cases, we are holding two of the three variables (\(r\), \(s\), \(\theta\)) constant while computing or estimating the third. We can do the same thing to generate novel stimuli, \(s\), while holding the parameters and responses constant. We refer to this process as synthesis and it facilitates the exploration of input space to improve our understanding of a model’s representations.

This is related to a long and fruitful thread of research in vision science that focuses on what humans cannot see, that is, the information they are insensitive to. Perceptual metamers — images that are physically distinct but perceptually indistinguishable — provide direct evidence of such information loss in visual representations. Color metamers were instrumental in the development of the Young-Helmholtz theory of trichromacy [Helmholtz1852]. In this context, metamers demonstrate that the human visual system projects the infinite dimensionality of the physical signal to three dimensions.

To make this more concrete, let’s walk through an example. Humans can see visible light, which is electromagnetic radiation with wavelengths between 400 and 700 nanometers (nm). We often want to be able to recreate the colors in a natural scene, such as when we take a picture. In order to do so, we can ask: what information do we need to record in order to do so? Let’s start with a solid patch of uniform color. If we wanted to recreate the complete energy spectra of the color, we would need to record a lot of numbers: even if we subsampled the wavelengths so that we only recorded the energy every 5 nm, we would need 61 numbers per color! But we know that most modern electronic screens only use three numbers, often called RGB (red, green, and blue) — why can we get away with throwing away so much information? Trichromacy and color metamers can help explain.

Researchers studying color perception arrived at a standard procedure – the bipartite color-matching experiment – for constraining a model for trichromatic metamers, illustrated in Fig. 2. An observer matches a monochromatic test color (i.e., a light with energy at only a single wavelength) with the physical mixture of three different monochromatic stimuli, called primaries. Thus, the goal is to create two perceptually-indistinguishable stimuli (metamers). Perhaps surprisingly, not only is this possible for any test color, it is also possible for just about any selection of primaries (as long as they’re within the visible light spectrum and sufficiently different from each other). For most human observers, three primaries are required: there are many colors that cannot be matched with only two primaries, and four yields non-unique responses. However, there are some people, for whom two primaries are sufficient.

Color matching experiment.

Color matching experiment

Requiring three primaries for most people, but two for some provided a hint regarding the underlying mechanisms: most people have cone photorecpetors from three distinct classes (generally referred to as S, M, and L, for “short”, “medium”, and “long”). But some forms of color blindness arise from genetic deviations in which only two classes are represented. Color metamers are created when cone responses have been matched. Human cones transform colors from a high-dimensional space (i.e., a vector describing the energy at each wavelength) to a three-dimensional one (i.e., a vector describing how active each cone class is). This means a large amount of wavelength information is discarded.

A worked example may help demonstrate this point more clearly. Let’s match the random light shown on the left below using the primaries shown on the right.

(Source code, png, hires.png, pdf)

_images/conceptual_intro_primaries.png

Left: Random light whose appearance we will match. Right: primaries.

The only way we can change the matching light is multiply those primaries by different numbers, moving them up and down. You might look at them and wonder how we can match the light shown on the left, with all its random wiggles. The important point is that we will not match those wiggles. We will instead match the cone activation levels, which we get by matrix multiplying our light by the cone fundamentals, shown below.

(Source code, png, hires.png, pdf)

_images/conceptual_intro_cones.png

Left: the cone sensitivity curves. Right: the response of each cone class to the random light shown in the previous figure.

With some linear algebra, we can compute another light that has very different amounts of energy at each wavelength but identical cone responses, shown below.

(Source code, png, hires.png, pdf)

_images/conceptual_intro_matched_light.png

If we look at the plot on the left, we can see that the two lights are very different physically, but we can see on the right that they generate the same cone responses and thus would be perceived identically.

In this example, the model was a simple linear system of cone responses, and thus we can generate a metamer, a physically different input with identical output, via some simple linear algebra. Metamers can be useful for understanding other systems as well, because discarding information is useful: the human visual system is discarding information at every stage of processing, not just at the cones’ absorption of light, and any computational system that seeks to classify images must discard a lot of information about unnecessary differences between images in the same class. However, generating metamer for other systems gets complicated: when a system gets more complex, linear algebra no longer suffices.

Let’s consider a slightly more complex example. Human vision is very finely detailed at the center of gaze, but gradually discards this detailed spatial information as distance to the center of gaze increases. This phenomenon is known as foveation, and can be easily seen by the difficulty in reading a paragraph of text or recognizing a face out of the corner of your eye (see [Lettvin1976] for an accessible discussion with examples). The simplest possible model of foveation would be to average pixel intensities in windows whose width grows linearly with distance from the center of an image, as shown in Fig. 7:

Foveated pixel intensity model.

The foveated pixel intensity model averages pixel values in elliptical windows that grow in size as you move away from the center of the image. It only cares about the average in these regions, not the fine details.

This model cares about the average pixel intensity in a given area, but doesn’t care how that average is reached. If the pixels in one of the ellipses above all have a value of 0.5, if they’re half 0s and half 1s, if they’re randomly distributed around 0.5 — those are all identical, as far as the model is concerned. A more concrete example is shown in Fig. 8:

Three images, all identical as far as the foveated pixel intensity model is concerned.

Three images that the foveated pixel intensity model considers identical. They all have the same average pixel values within the foveated elliptical regions (the red ellipse shows an example averaging region at that location), but differ greatly in their fine details.

These three images are all identical for the foveated pixel intensity model described above (the red ellipse shows the size of the averaging region at that location). These three images all have identical average pixel intensities in small regions whose size grows as they move away from the center of the image. However, like the color metamers discussed earlier, they are all very physically different: the leftmost image is a natural image, the rightmost one has lots of high-frequency noise, while the center one looks somewhat blurry. You might think that, because the model only cares about average pixel intensities, you can throw away all the fine details and the model won’t notice. And you can! But you can also add whatever kind of fine details you’d like, including random noise — the model is completely insensitive to them.

With relatively simple linear models like human trichromacy and the foveated pixel intensity model, this way of thinking about models may seem unnecessary. But it is very difficult to understand how models will perform on unexpected or out-of-distribution data! The burgeoning literature on adversarial examples and robustness in machine learning provides many of examples of this, such as the addition of a small amount of noise (invisible to humans) changing the predicted category [Szegedy2013] or the addition of a small elephant to a picture completely changing detected objects’ identities and boundaries [Rosenfeld2018]. Exploring model behavior on all possible inputs is impossible — the space of all possible images is far too vast — but image synthesis provides one mechanism for exploration in a targeted manner.

Furthermore, image synthesis provides a complementary method of comparing models to the standard procedure. Generally, scientific models are evaluated on their ability to fit data or perform a task, such as how well a model performs on ImageNet or how closely a model tracks firing rate in some collected data. However, many models can perform a task equally or comparably well [2]. By using image synthesis to explore models’ representational spaces, we can gain a fuller understanding of how models succeed and how they fail to capture the phenomena under study.

Beyond Metamers

plenoptic contains more than just metamers — it provides a set of methods for performing image synthesis. Each method allows for different exploration of a model’s representational space:

  • Metamers investigate what features the model disregards entirely.

  • Eigendistortions investigates which features the model considers the least and which it considers the most important

  • Maximal differentiation (MAD) competition enables efficient comparison of two metrics, highlighting the aspects in which their sensitivities differ.

  • Geodesics investigates how a model represents motion and what changes to an image it considers reasonable.

The goal of this package is to facilitate model exploration and understanding. We hope that providing these tools helps tighten the model-experiment loop: when a model is proposed, whether by importing from a related field or earlier experiments, plenoptic enables scientists to make targeted exploration of the model’s representational space, generating stimuli that will provide the most information. We hope to help theorists become more active participants in directing future experiments by efficiently finding new predictions to test.

[Helmholtz1852]

Helmholtz, H. (1852). LXXXI. on the theory of compound colours. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 4(28), 519–534. http://dx.doi.org/10.1080/14786445208647175

[Lettvin1976]

Lettvin, J. Y. (1976). On Seeing Sidelong. The Sciences, 16(4), 10–20. http://jerome.lettvin.com/jerome/OnSeeingSidelong.pdf

[Szegedy2013]

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. https://arxiv.org/abs/1312.6199

[Rosenfeld2018]

Rosenfeld, A., Zemel, R., & Tsotsos, J.~K. (2018). The elephant in the room. https://arxiv.org/abs/1808.03305

Model requirements

plenoptic provides a model-based synthesis framework, and therefore we require several things of the models used with the package (the plenoptic.tools.validate.validate_model() function provides a convenient way to check whether your model meets the following requirements, and see plenoptic.simulate.models for some examples). Your model:

  • should inherit torch.nn.Module (this is not strictly necessary, but will make meeting the other requirements easier).

  • must be callable, be able to accept a 4d torch.Tensor as input, and return a 3d or 4d torch.Tensor as output. If you inherit torch.nn.Module, implementing the forward() method will make your model callable.

  • the above transformation must be differentiable by torch. In practice, this generally means you perform all computations using torch functions (unless you want to write a custom .backward() method).

  • must not have any learnable parameters. This is largely to save time by avoiding calculation of unnecessary gradients, but synthesis is performed with a fixed model — we are optimizing the input, not the model parameters. You can use the helper function plenoptic.tools.validate.remove_grad() to detach all parameters. Similarly, your model should probably be in evaluation mode (i.e., call model.eval()), though this is not strictly required. See the pytorch documentation for the difference between evaluation mode and disabling gradient computation.

Additionally, your model inputs and outputs should be real- or complex-valued and should be interpretable for all possible values (within some range). The intention of stimulus synthesis is to facilitate model understanding — if the synthesized stimulus are meaningless, this defeats the purpose. (Note that domain restrictions, such as requiring integer-valued inputs, can probably be accomplished by adding a penalty to an objective function, but will make your life harder.)

plenoptic.synthesize.mad_competition.MADCompetition uses metrics, rather than models, which have the following requirements (use the plenoptic.tools.validate.validate_metric() function to check whether your metric meets the following requirements and see plenoptic.metric for some examples):

  • a metric must be callable, accept two 4d torch.Tensor objects as inputs, and return a scalar as output. This can be a torch.nn.Module object, like models, but the examples metrics are all functions.

  • when called on two identical inputs, the metric must return a value of 0.

Finally, plenoptic.synthesize.metamer.Metamer supports coarse-to-fine synthesis, as described in [PS]. To make use of coarse-to-fine synthesis, your model must meet the following additional requirements (use the plenoptic.tools.validate.validate_coarse_to_fine() function to check and see plenoptic.simulate.models.portilla_simoncelli.PortillaSimoncelli for an example):

  • the model must have a scales attribute.

  • in addition to a torch.Tensor, the forward() method must also be able to accept an optional scales keyword argument (equivalently, when calling the model, if the model does not inherit torch.nn.Module).

  • that argument should be a list, containing one or more values from model.scales, the shape of the output should change when scales is a strict subset of all possible values.

[PS]

J Portilla and E P Simoncelli. A Parametric Texture Model based on Joint Statistics of Complex Wavelet Coefficients. Int’l Journal of Computer Vision. 40(1):49-71, October, 2000. abstract, paper.

Quickstart

The following tutorial is intended to show you how to create a simple plenoptic-compliant model and use it with our synthesis methods, with a brief explanation of how to interpret the outputs. See the other tutorials for more details.

[1]:
import plenoptic as po
import torch
import pyrtools as pt
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72

%matplotlib inline

All plenoptic methods require a “reference” or “target” image — for Metamer synthesis, for example, this is the image whose representation we will match. Let’s load in an image of Einstein to serve as our reference here:

[2]:
im = po.data.einstein()
fig = po.imshow(im)
_images/tutorials_00_quickstart_3_0.png

Models can be really simple, as this demonstrates. It needs to inherit torch.nn.Module and just needs two methods: __init__ (so it’s an object) and forward (so it can take an image). See the Models page of the documentation for more details.

For this notebook, we’ll initialize a simple plenoptic-compatible model and call its forward method. This model just convolves a 2d gaussian filter across an image, so it’s a low-pass model, preserving low frequency information while discarding the high frequencies.

[3]:
# this is a convenience function for creating a simple Gaussian kernel
from plenoptic.simulate.canonical_computations.filters import circular_gaussian2d

# Simple rectified Gaussian convolutional model
class SimpleModel(torch.nn.Module):
    # in __init__, we create the object, initializing the convolutional weights and nonlinearity
    def __init__(self, kernel_size=(7, 7)):
        super().__init__()
        self.kernel_size = kernel_size
        self.conv = torch.nn.Conv2d(1, 1, kernel_size=kernel_size, padding=(0, 0), bias=False)
        self.conv.weight.data[0, 0] = circular_gaussian2d(kernel_size, 3.)

    # the forward pass of the model defines how to get from an image to the representation
    def forward(self, x):
        # use circular padding so our output is the same size as our input
        x = po.tools.conv.same_padding(x, self.kernel_size, pad_mode='circular')
        return self.conv(x)

model = SimpleModel()
rep = model(im)
/home/billbrod/miniconda3/envs/plenoptic/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

To work with out synthesis methods, a model must accept a 4d tensor as input and return a 3d or 4d tensor as output. 4d inputs are commonly used for pytorch models, and the dimensions are batch (often, multiple images), channel (often, RGB or outputs of different convolutional filters), height, and width. The output should then either return a 1d vector or a 2d image per batch and channel. If your model operates across channels or batches, that’s no problem; for example if the model transforms RGB to grayscale, your input would have 3 channels and your output would have 1.

We can see that our Gaussian model satisfies this constraint:

[4]:
print(im.shape)
print(rep.shape)
torch.Size([1, 1, 256, 256])
torch.Size([1, 1, 256, 256])

There are also several more abstract constraints (e.g., model must accept real-valued inputs and return real-valued outputs), so it’s recommended that you read the Models page of the documentation before creating your own model.

The following shows the image and the model output. We can see that output is a blurred version of the input, as we would expect from a low-pass model.

[5]:
fig = po.imshow(torch.cat([im, rep]), title=['Original image', 'Model output'])
_images/tutorials_00_quickstart_9_0.png

Before moving forward, let’s think about this model. It’s a simple Gaussian convolution which throws out high-frequency information, as we can see in the representation above. Metamers provide a tool for exploring a model’s insensitivities, so any metamers we synthesize should capitalize on this: they should differ from the original image in the high frequencies.

There’s one final step before we’re ready for synthesis. Most pytorch models will have learnable parameters, such as the weight on the convolution filter we created above, because the focus is generally on training the model to best perform some task. In plenoptic, models are fixed because we take the opposite approach: generating some new stimulus to better a understand a given model. Thus, all synthesis methods will raise a ValueError if given a model with any learnable parameters. We provide a helper function to remove these gradients:

[6]:
po.tools.remove_grad(model)

Okay, now we’re ready to start with metamer synthesis. To initialize, we only need the model and the image (there are some additional options, but the defaults are fine in this case; see the Metamer notebook if you’re interested). In general, you’ll probably need to play with these options to find a good solution. It’s also probably a good idea, while getting started, to set store_progress to True (to store every iteration) or some int (to store every int iterations) so you can examine synthesis progress.

[7]:
metamer = po.synth.Metamer(im, model)

matched_im = metamer.synthesize(store_progress=True, max_iter=20)
# if we call synthesize again, we resume where we left off
matched_im = metamer.synthesize(store_progress=True, max_iter=150)
/home/billbrod/Documents/plenoptic/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
  warnings.warn(
/home/billbrod/Documents/plenoptic/plenoptic/synthesize/metamer.py:195: UserWarning: Loss has converged, stopping synthesis
  warnings.warn("Loss has converged, stopping synthesis")

We can then examine the loss over time. There’s a convenience function for this, but you could also call plt.semilogy(metamer.losses) to create it yourself.

[8]:
po.synth.metamer.plot_loss(metamer)
[8]:
<AxesSubplot: xlabel='Synthesis iteration', ylabel='Loss'>
_images/tutorials_00_quickstart_15_1.png

The loss decreases steadily and has reached a very low value. In fact, based on our convergence criterion (one of the optional arguments), it looks as though we’ve converged (we could change this argument to continue synthesis).

We can then look at the reference and metamer images, as well as the model’s outputs on the two images:

[9]:
fig = po.imshow([im, rep, metamer.metamer, model(metamer.metamer)],
                col_wrap=2, vrange='auto1',
                title=['Original image', 'Model representation\nof original image',
                       'Synthesized metamer', 'Model representation\nof synthesized metamer']);
_images/tutorials_00_quickstart_17_0.png

We can see that, even though the target and synthesized images look very different, the two model outputs look basically identical (which matches the exceedingly low loss value we see above). (The left column shows the images and the right column the model outputs; top row shows the original image and bottom the synthesized metamer.)

It may seem strange that the synthesized image looks like it has high-frequency noise in it — a Gaussian is a low-pass filter, so why isn’t the model metamer just a blurred version of the original image? Indeed, such a blurred image would be a model metamer, but it’s only one of many. Remember what we mentioned earlier: Gaussians are insensitive to high-frequency information, which not only means that their response doesn’t change when you remove that information, but that you can put any amount of high frequency information into an image without affecting the model’s output. Put another way, you can randomize the contents of the model’s null space without affecting its response, and the goal of metamer synthesis is to generate different images that do just that.

We can also view a movie of our progress so far.

[10]:
po.tools.convert_anim_to_html(po.synth.metamer.animate(metamer, included_plots=['display_metamer', 'plot_loss'], figsize=(12, 5)))
[10]:

We can see the model’s insensitivity to high frequencies more dramatically by initializing our metamer synthesis with a different image. By default, we initialize with a patch of white noise, but we can initialize with any image of the same size. Let’s try with a different natural image, a picture of Marie Curie.

[11]:
curie = po.data.curie()
po.imshow([curie]);
_images/tutorials_00_quickstart_21_0.png
[12]:
metamer = po.synthesize.Metamer(im, model, initial_image=curie, )

# we increase the length of time we run synthesis and decrease the
# stop_criterion, which determines when we think loss has converged
# for stopping synthesis early.
synth_image = metamer.synthesize(max_iter=500,  stop_criterion=1e-6)
/home/billbrod/Documents/plenoptic/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
  warnings.warn(
/home/billbrod/Documents/plenoptic/plenoptic/synthesize/metamer.py:195: UserWarning: Loss has converged, stopping synthesis
  warnings.warn("Loss has converged, stopping synthesis")

Let’s double-check that our synthesis looks like it’s reached a good solution by checking the loss curve:

[13]:
po.synth.metamer.plot_loss(metamer)
[13]:
<AxesSubplot: xlabel='Synthesis iteration', ylabel='Loss'>
_images/tutorials_00_quickstart_24_1.png

Good, now let’s examine our synthesized metamer and the model output, as before:

[14]:
fig = po.imshow([im, rep, metamer.metamer, model(metamer.metamer)],
                col_wrap=2, vrange='auto1',
                title=['Original image', 'Model representation\nof original image',
                       'Synthesized metamer', 'Model representation\nof synthesized metamer']);
_images/tutorials_00_quickstart_26_0.png

We see that the synthesized metamer here looks quite different from both the original and from our previous metamer, while the model outputs look very similar. Here, our synthesized model metamer looks like a blurry picture of Einstein with a high-frequency “shadow” of Curie added on top. Again, this is because the Gaussian model is insensitive to high frequencies, and thus a model metamer can include any high frequency information.

By generating model metamers, we’ve gained a better understanding of the information our model is invariant to, but what if we want a better understanding of what our model is sensitive to? We can use Eigendistortion for that.

Like Metamer, Eigendistortion accepts an image and a model as its inputs. By default, it synthesizes the top and bottom eigendistortion, that is, the changes to the input image that the model finds most and least noticeable.

[15]:
eig = po.synthesize.Eigendistortion(im, model)
eig.synthesize();

Initializing Eigendistortion -- Input dim: 65536 | Output dim: 65536

Let’s examine those distortions:

[16]:
po.imshow(eig.eigendistortions, title=['Maximum eigendistortion',
                                       'Minimum eigendistortion']);
_images/tutorials_00_quickstart_30_0.png

We can see they make sense: the most noticeable distortion is a very low-frequency modification to the image, with a period of about half the image. The least noticeable, on the other hand, is very high-frequency, which matches our understanding from the metamer example above.

This brief introduction hopefully demonstrates how you can use plenoptic to better understand your model representations! There’s much more that can be done with both these methods, as well as two additional methods, MADCompetition and Geodesic, to explore.

Citation Guide

If you use plenoptic in a published academic article or presentation, please cite both the code by the DOI as well [VSS2023]. You can use the following:

  • Code: zenodo

  • Paper:

    @article{duong2023plenoptic,
      title={Plenoptic: A platform for synthesizing model-optimized visual stimuli},
      author={Duong, Lyndon and Bonnen, Kathryn and Broderick, William and Fiquet, Pierre-{\'E}tienne and Parthasarathy, Nikhil and Yerxa, Thomas and Zhao, Xinyuan and Simoncelli, Eero},
      journal={Journal of Vision},
      volume={23},
      number={9},
      pages={5822--5822},
      year={2023},
      publisher={The Association for Research in Vision and Ophthalmology}
    }
    

Additionally, please cite the following paper(s) depending on which component you use:

Note that, the citations given above define the application of the relevant idea (“metamers”) to computational models of the visual system that are instantiated in the algorithms found in plenoptic, but that, for the most part, these general concepts were not developed by the developers of plenoptic or the Simoncelli lab and are, in general, much older – the idea of metamers goes all the way back to [Helmholtz1852]! The papers above generally provide some discussion of this history and can point you to further reading, if you are interested.

[VSS2023]

Lyndon Duong, Kathryn Bonnen, William Broderick, Pierre-Étienne Fiquet, Nikhil Parthasarathy, Thomas Yerxa, Xinyuan Zhao, Eero Simoncelli; Plenoptic: A platform for synthesizing model-optimized visual stimuli. Journal of Vision 2023;23(9):5822. https://doi.org/10.1167/jov.23.9.5822.

Eigendistortions

Run notebook online with Binder:Binder

In this tutorial we will cover:

  • theory behind eigendistortions

  • how to use the plenoptic.synthesize.eigendistortion.Eigendistortion object

  • computing eigendistortions using a simple input and linear model

  • computing extremal eigendistortions for different layers of ResNet18

Introduction

How can we assess whether a model sees like we do? One way is to test whether they “notice” image distortions the same way as us. For a model, a noticeable distortion would be an image perturbation that elicits a change in its response. If our goal is to create models with human-like vision, then an image distortion that is (not) noticeable to a human should also (not) be noticeable to our models. Eigendistortions provide a framework with which to compare models to human visual perception of distortions.

Berardino, A., Laparra, V., Ballé, J. and Simoncelli, E., 2017. Eigen-distortions of hierarchical representations. In Advances in neural information processing systems (pp. 3530-3539).

http://www.cns.nyu.edu/pub/lcv/berardino17c-final.pdf

http://www.cns.nyu.edu/~lcv/eigendistortions/

See the last section of this notebook for more mathematical detail

[1]:
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
import torch
from plenoptic.synthesize.eigendistortion import Eigendistortion
from torch import nn
# this notebook uses torchvision, which is an optional dependency.
# if this fails, install torchvision in your plenoptic environment
# and restart the notebook kernel.
try:
    from torchvision import models
except ModuleNotFoundError:
    raise ModuleNotFoundError("optional dependency torchvision not found!"
                              " please install it in your plenoptic environment "
                              "and restart the notebook kernel")
import os.path as op
import plenoptic as po

Example 1: Linear model, small 1D input “image”

1.1) Creating the model

The fundamental goal of computing eigendistortions is to understand how small changes (distortions) in inputs affect model outputs. Any model can be thought of as a black box mapping an input to an output, \(f(x): x \in \mathbb{R}^n \mapsto y \in \mathbb{R}^m\), i.e. a function takes as input an n-dimensional vector \(x\) and outputs an m-dimensional vector \(y\).

The simplest model that achieves this is linear,

:nbsphinx-math:`begin{align}

y &= f(x) = Mx, && text{$Min mathbb{R^{mtimes n}}$}.

end{align}`

In this linear case, the Jacobian is fixed \(J= \frac{\partial f}{\partial x}=M\) for all possible inputs \(x\). Can we synthesize a distortion \(\epsilon\) such that \(f(x+\epsilon)\) is maximally/minimally perturbed from the original \(f(x)\)? Yes! This would amount to finding the first and last eigenvectors of the Fisher information matrix, i.e. \(J^TJ v = \lambda v\).

A few things to note:

  • Input image should always be a 4D tensor whose dimensions torch.Size([batch=1, channel, height, width]).

  • We don’t allow for batch synthesis of eigendistortions so the batch dim should always = 1

We’ll be working with the Eigendistortion object and its instance method, synthesize().

Let’s make a linear PyTorch model and compute eigendistortions for a given input.

[2]:
class LinearModel(nn.Module):
    """The simplest model we can make.
    Its Jacobian should be the weight matrix of M, and the eigenvectors of the Fisher matrix are therefore the
    eigenvectors of M.T @ M"""
    def __init__(self, n, m):
        super(LinearModel, self).__init__()
        torch.manual_seed(0)
        self.M = nn.Linear(n, m, bias=False)

    def forward(self, x):
        y = self.M(x)  # this computes y = x @ M.T
        return y

n = 25  # input vector dim (can you predict what the eigenvec/vals would be when n<m or n=m? Feel free to try!)
m = 10  # output vector dim

mdl_linear = LinearModel(n, m)
po.tools.remove_grad(mdl_linear)

x0 = torch.ones((1, 1, 1, n))  # input must betorch.Size([batch=1, n_chan, img_height, img_width])
y0 = mdl_linear(x0)

fig, ax = plt.subplots(2, 1, sharex='all', sharey='all')
ax[0].stem(x0.squeeze())
ax[0].set(title=f'{n:d}D Input')

ax[1].stem(y0.squeeze().detach(), markerfmt='C1o')
ax[1].set(title=f'{m:d}D Output')
fig.tight_layout()
_images/tutorials_intro_02_Eigendistortions_3_0.png
1.2 - Synthesizing eigendistortions of linear model

To compute the eigendistortions of this model, we can instantiate an Eigendistortion object with a 4D input image with dims torch.Size([batch=1, n_channels, img_height, img_width]), and any PyTorch model with valid forward and backward methods. After that, we simply call the instance method synthesize() and choose the appropriate synthesis method. Normally our input has thousands of entries, but our input in this case is small (only n=25 entries), so we can compute the full \(m \times n\) Jacobian, and all the eigenvectors of the \(n \times n\) Fisher matrix, \(F=J^TJ\). The synthesize method does this for us and stores the outputs (eigendistortions, eigenvalues, eigenindex) of the synthesis. These return values point to synthesized_signal, synthesized_eigenvalues, synthesized_eigenindex attributes of the object, respectively.

[3]:
help(Eigendistortion.synthesize)  # fully documented

eig_jac = Eigendistortion(x0, mdl_linear)  # instantiate Eigendistortion object using an input and model
eig_jac.synthesize(method='exact')  # compute the entire Jacobian exactly
Help on function synthesize in module plenoptic.synthesize.eigendistortion:

synthesize(self, method: Literal['exact', 'power', 'randomized_svd'] = 'power', k: int = 1, max_iter: int = 1000, p: int = 5, q: int = 2, stop_criterion: float = 1e-07)
    Compute eigendistortions of Fisher Information Matrix with given input image.

    Parameters
    ----------
    method
        Eigensolver method. 'exact' tries to do eigendecomposition directly (
        not recommended for very large inputs). 'power' (default) uses the power method to compute first and
        last eigendistortions, with maximum number of iterations dictated by n_steps. 'randomized_svd' uses
        randomized SVD to approximate the top k eigendistortions and their corresponding eigenvalues.
    k
        How many vectors to return using block power method or svd.
    max_iter
        Maximum number of steps to run for ``method='power'`` in eigenvalue computation. Ignored
        for other methods.
    p
        Oversampling parameter for randomized SVD. k+p vectors will be sampled, and k will be returned. See
        docstring of ``_synthesize_randomized_svd`` for more details including algorithm reference.
    q
        Matrix power parameter for randomized SVD. This is an effective trick for the algorithm to converge to
        the correct eigenvectors when the eigenspectrum does not decay quickly. See
        ``_synthesize_randomized_svd`` for more details including algorithm reference.
    stop_criterion
        Used if ``method='power'`` to check for convergence. If the L2-norm
        of the eigenvalues has changed by less than this value from one
        iteration to the next, we terminate synthesis.


Initializing Eigendistortion -- Input dim: 25 | Output dim: 10
Computing all eigendistortions
/home/billbrod/Documents/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
  warnings.warn(
1.3 - Comparing our synthesis to ground-truth

The Jacobian is in general a rectangular (not necessarily square) matrix \(J\in \mathbb{R}^{m\times n}\). Since this is a linear model, let’s check if the computed Jacobian (stored as an attribute in the Eigendistortion object) matches the weight matrix \(M\).

Since the eigendistortions are each 1D (vectors) in this example, we can display them all as an image where each column is an eigendistortion, each pixel is an entry of the eigendistortion, and the intensity is proportional to its value.

[4]:
fig, ax = plt.subplots(1, 2, sharex='all', sharey='all')
ax[0].imshow(eig_jac.jacobian)
ax[1].imshow(mdl_linear.M.weight.data, vmin=eig_jac.jacobian.min(), vmax=eig_jac.jacobian.max())
ax[0].set(xticks=[], yticks=[], title='Solved Jacobian')
ax[1].set(title='Linear model weight matrix')
fig.tight_layout()

print("Jacobian == weight matrix M?:", eig_jac.jacobian.allclose(mdl_linear.M.weight.data))

# Eigenvectors (aka eigendistortions) and associated eigenvectors are found in the distortions dict attribute
fig, ax = plt.subplots(1, 2, sharex='all')
ax[0].imshow(eig_jac.eigendistortions.squeeze(), vmin=-1, vmax=1, cmap='coolwarm')
ax[0].set(title='Eigendistortions', xlabel='Eigenvector index', ylabel='Entry')
ax[1].plot(eig_jac.eigenvalues, '.')
ax[1].set(title='Eigenvalues', xlabel='Eigenvector index', ylabel='Eigenvalue')
fig.tight_layout()
Jacobian == weight matrix M?: True
_images/tutorials_intro_02_Eigendistortions_7_1.png
_images/tutorials_intro_02_Eigendistortions_7_2.png
1.4 - What do these eigendistortions mean?

The first eigenvector (with the largest eigenvalue) is the direction in which we can distort our input \(x\) and change the response of the model the most, i.e. its most noticeable distortion. For the last eigenvector, since its associated eigenvalue is 0, then no change in response occurs when we distort the input in that direction, i.e. \(f(x+\epsilon)=f(x)\). So this distortion would be imperceptible to the model.

In most cases, our input would be much larger. An \(n\times n\) image has \(n^2\) entries, meaning the Fisher matrix is \(n^2 \times n^2\), and therefore \(n^2\) possible eigendistortions – certainly too large to store in memory. We need to instead resort to numerical methods to compute the eigendistortions. To do this, we can just set our synthesis method='power' to estimate the first eigenvector (most noticeable distortion) and last eigenvector (least noticeable distortion) for the image.

[5]:
eig_pow = Eigendistortion(x0, mdl_linear)
eig_pow.synthesize(method='power', max_iter=1000)
eigdist_pow = eig_pow.eigendistortions.squeeze()  # squeeze out singleton channel dimension (these are grayscale)
eigdist_jac = eig_jac.eigendistortions.squeeze()

print(f'Indices of computed eigenvectors: {eig_pow.eigenindex}\n')

fig, ax = plt.subplots(1,1)
ax.plot(eig_pow.eigenindex, eig_pow.eigenvalues, '.', markersize=15, label='Power')
ax.plot(eig_jac.eigenvalues, '.-', label='Jacobian')
ax.set(title='Power method vs Jacobian', xlabel='Eigenvector index', ylabel='Eigenvalue')
ax.legend(title='Synth. method')

fig, ax = plt.subplots(1, 2, sharex='all', sharey='all', figsize=(8,3))
ax[0].plot(eigdist_pow[0] - eigdist_jac[0])
ax[0].set(title='Difference in first eigendists')

ax[1].stem(eigdist_pow[-1] - eigdist_jac[-1])
ax[1].set(title='Difference in last eigendists')

fig, ax = plt.subplots(1,1)
ax.stem(eigdist_jac @ eigdist_pow[-1])
ax.set(title="Power method's last eigenvec projected on all Jacobian method's eigenvec",
       xlabel='Eigenvector index', ylabel='Projection')

print('Are the first eigendistortions the same?', eigdist_pow[0].allclose(eigdist_jac[0], atol=1e-3))
print('Are the last eigendistortions the same?', eigdist_pow[-1].allclose(eigdist_jac[-1], atol=1e-3))

# find eigendistortions of Jacobian-method whose eigenvalues are zero
ind_zero = eig_jac.eigenvalues.isclose(torch.zeros(1), atol=1e-4)

Initializing Eigendistortion -- Input dim: 25 | Output dim: 10
Top k=1 eigendists computed | Stop criterion 1.00E-07 reached.
Bottom k=1 eigendists computed | Stop criterion 1.00E-07 reached.
Indices of computed eigenvectors: tensor([ 0, 24])

Are the first eigendistortions the same? True
Are the last eigendistortions the same? False
_images/tutorials_intro_02_Eigendistortions_9_5.png
_images/tutorials_intro_02_Eigendistortions_9_6.png
_images/tutorials_intro_02_Eigendistortions_9_7.png

The power method’s first eigendistortion matches the ground-truth first eigendistortion obtained via the Jacobian solve. And while the last eigendistortions don’t match, the last power method eigendistortion lies in the span of all the eigendistortions whose eigenvalues are zero. Each of these eigendistortions whose eigenvalues are zero are equivalent. Any distortion of \(x\) in the span of these eigendistortions would result in no change in the model output, and would therefore be imperceptible to the model.

1.5 - The Fisher information matrix is a locally adaptive

Different inputs should in general have different sets of eigendistortions – a noticible distortion in one image would not necessarily be noticeable in a different image. The only case where they should be the same regardless of input is when the model is fully linear, as in this simple example. So let’s check if the Jacobian at a different input still equals the weight matrix \(M\).

[6]:
x1 = torch.randn_like(x0)  # generate some random input
eig_jac2 = Eigendistortion(x1, model=mdl_linear)
eig_jac2.synthesize(method='exact')  # since the model is linear, the Jacobian should be the exact same as before

print(f'Does the jacobian at x1 still equal the model weight matrix?'
      f' {eig_jac2.jacobian.allclose(mdl_linear.M.weight.data)}')

Initializing Eigendistortion -- Input dim: 25 | Output dim: 10
Computing all eigendistortions
Does the jacobian at x1 still equal the model weight matrix? True

Example 2: Which layer of ResNet is a better model of human visual distortion perception?

Now that we understand what eigendistortions are and how the Eigendistortion class works, let’s compute them real images using a more complex model like Vgg16. The response vector \(y\) doesn’t necessarily have to be the output of the last layer of the model; we can also compute Eigendistortions for intermediate model layers too. Let’s synthesize distortions for an image using different layers of Vgg16 to see which layer produces extremal eigendistortions that align more with human perception.

2.1 - Load an example an image
[10]:
n = 128  # this will be the img_height and width of the input, you can change this to accommodate your machine
img = po.data.color_wheel()
# center crop the image to nxn
img = po.tools.center_crop(img, n)
po.imshow(img, as_rgb=True, zoom=3);
_images/tutorials_intro_02_Eigendistortions_14_0.png
2.2 - Instantiate models and Eigendistortion objects

Let’s make a wrapper class that can return the nth layer output of vgg. We’re going to use this to compare eigendistortions synthesized using different layers of Vgg as models for distortion perception.

[11]:
# Create a class that takes the nth layer output of a given model
class NthLayer(torch.nn.Module):
    """Wrap any model to get the response of an intermediate layer

    Works for Resnet18 or VGG16.

    """
    def __init__(self, model, layer=None):
        """
        Parameters
        ----------
        model: PyTorch model
        layer: int
            Which model response layer to output
        """
        super().__init__()
        try:
            # then this is VGG16
            features = list(model.features)
        except AttributeError:
            # then it's resnet18
            features = ([model.conv1, model.bn1, model.relu, model.maxpool] + [l for l in model.layer1] +
                        [l for l in model.layer2] + [l for l in model.layer3] + [l for l in model.layer4] +
                        [model.avgpool, model.fc])
        self.features = nn.ModuleList(features).eval()

        if layer is None:
            layer = len(self.features)
        self.layer = layer

    def forward(self, x):
        for ii, mdl in enumerate(self.features):
            x = mdl(x)
            if ii == self.layer:
                return x

# different potential models of human visual perception of distortions
resnet18_a = NthLayer(models.resnet18(pretrained=True), layer=3)
po.tools.remove_grad(resnet18_a)
resnet18_b = NthLayer(models.resnet18(pretrained=True), layer=6)
po.tools.remove_grad(resnet18_b)

ed_resneta = Eigendistortion(img, resnet18_a)
ed_resnetb = Eigendistortion(img, resnet18_b)
/home/billbrod/micromamba/envs/plenoptic/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/billbrod/micromamba/envs/plenoptic/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)

Initializing Eigendistortion -- Input dim: 49152 | Output dim: 65536

Initializing Eigendistortion -- Input dim: 49152 | Output dim: 32768
2.3 - Synthesizing distortions

The input dimensionality in this example is huge compared to our linear model example – it is \((\textrm{n_chan} \times \textrm{img_height} \times \textrm{img_width})^2\), meaning the Fisher matrix is too massive to compute exactly. We must turn to iterative methods. Let’s synthesize the extremal eigendistortions for this picture of Einstein using the different layers of ResNet as defined above.

[12]:
# Bump up n_steps if you wish
ed_resneta.synthesize(method='power', max_iter=400)
ed_resnetb.synthesize(method='power', max_iter=400);
Top k=1 eigendists computed | Stop criterion 1.00E-07 reached.
Top k=1 eigendists computed | Stop criterion 1.00E-07 reached.
2.4 - Visualizing eigendistortions

Let’s display the eigendistortions. Eigendistortion has an instance method display that will display a 2x3 subplot figure of images. The top row shows the original image on the left, the synthesized maximal eigendistortion on the right, and some constsant \(\alpha\) times the eigendistortion added to the image in the middle panel. The bottow row has a similar layout, but displays the minimal eigendistortion. Let’s display the eigendistortions for both models.

[13]:
po.synth.eigendistortion.display_eigendistortion(ed_resneta, 0, as_rgb=True, zoom=3);
po.synth.eigendistortion.display_eigendistortion(ed_resneta, -1, as_rgb=True, zoom=3);
_images/tutorials_intro_02_Eigendistortions_20_0.png
_images/tutorials_intro_02_Eigendistortions_20_1.png
2.5 - Which synthesized extremal eigendistortions better characterize human perception?

Let’s compare eigendistortions within a model first. One thing we immediately notice is that the first eigendistortion (labeled maxdist) is indeed more noticeable than mindist. maxdist is localized to a single portion of the image, and has lower, more prominent spatial frequency content than mindist. mindist looks more like high frequency noise distributed across the image.

But how do the distortions compare between models – which model better characterizes human visual perception of distortions? The only way to truly this is to run an experiment and ask human observers which distortions are most/least noticeable to them. The best model should produce a maximally noticeable distortion that is more noticeable than other models’ maximally noticeable distortions, and its minimally noticeable distortion should be less noticeable than other models’ minimally noticeable distortions.

See Berardino et al. 2017 for more details.

2.6 - Synthesizing distortions for other images

Remember the Fisher matrix is locally adaptive, meaning that a different image should have a different set of eigendistortions. Let’s finish off this notebook with another set of extremal eigendistortions for these two Vgg16 layers on a different image.

[14]:
img = po.data.curie()

# center crop the image to nxn
img = po.tools.center_crop(img, n)
# because this is a grayscale image but ResNet expects a color image,
# need to duplicate along the color dimension
img3 = torch.repeat_interleave(img, 3, dim=1)

ed_resneta = Eigendistortion(img3, resnet18_a)
ed_resnetb = Eigendistortion(img3, resnet18_b)

ed_resneta.synthesize(method='power', max_iter=400)
ed_resnetb.synthesize(method='power', max_iter=400)

po.imshow(img, zoom=2, title="Original");

Initializing Eigendistortion -- Input dim: 49152 | Output dim: 65536

Initializing Eigendistortion -- Input dim: 49152 | Output dim: 32768
Top k=1 eigendists computed | Stop criterion 1.00E-07 reached.
Top k=1 eigendists computed | Stop criterion 1.00E-07 reached.
_images/tutorials_intro_02_Eigendistortions_22_7.png
[15]:
po.synth.eigendistortion.display_eigendistortion(ed_resneta, 0, as_rgb=True, zoom=2, title="top eigendist");
po.synth.eigendistortion.display_eigendistortion(ed_resneta, -1, as_rgb=True, zoom=2, title="bottom eigendist");

po.synth.eigendistortion.display_eigendistortion(ed_resnetb, 0, as_rgb=True, zoom=2, title="top eigendist");
po.synth.eigendistortion.display_eigendistortion(ed_resnetb, -1, as_rgb=True, zoom=2, title="bottom eigendist");
_images/tutorials_intro_02_Eigendistortions_23_0.png
_images/tutorials_intro_02_Eigendistortions_23_1.png
_images/tutorials_intro_02_Eigendistortions_23_2.png
_images/tutorials_intro_02_Eigendistortions_23_3.png

Appendix: More mathematical detail

If we have a model that takes an N-dimensional input and outputs an M-dimensional response, then its Jacobian, \(J=\frac{\partial f}{\partial x}\), is an \(M\times N\) matrix of partial derivatives that tells us how much a change in each entry of the input would change each entry of the output. With the assumption of additive Gaussian noise in the output space Fisher Information Matrix, \(F\), is a symmetric positive semi-definite, \(N\times N\) matrix computed using the Jacobian, \(F=J^TJ\). If you are familiar with linear algebra, you might notice that the eigenvectors of \(F\) are the right singular vectors of the Jacobian. Thus, an eigendecomposition \(F=V\Lambda V\) yields directions of the input space (vectors in \(V\)) along which changes in the output space are rank-ordered by entries in diagonal matrix \(\Lambda\).

Given some input image \(x_0\), an eigendistortion is an additive perturbation, \(\epsilon\), in the input domain that changes the response in a model’s output domain of interest (e.g. an intermediate layer of a neural net, the output of a nonlinear model, etc.). These perturbations are named eigendistortions because they push \(x_0\) along eigenvectors of the Fisher Information Matrix. So we expect distortions \(x_0\) along the direction of the eigenvector with the maximum eigenvalue will change the representation the most, and distortions along the eigenvector with the minimum eigenvalue will change the representation the least. (And pushing along intermediate eigenvectors will change the representation by an intermediate amount.)

Representational Geodesic

NOTE: This notebook and the geodesic method are still under construction and subject to change. They will run, but might not find the most informative geodesic.

[1]:
import numpy as np
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
%matplotlib inline

import pyrtools as pt
import plenoptic as po
from plenoptic.tools import to_numpy
%load_ext autoreload
%autoreload 2

import torch
import torch.nn as nn
# this notebook uses torchvision, which is an optional dependency.
# if this fails, install torchvision in your plenoptic environment
# and restart the notebook kernel.
try:
    import torchvision
except ModuleNotFoundError:
    raise ModuleNotFoundError("optional dependency torchvision not found!"
                              " please install it in your plenoptic environment "
                              "and restart the notebook kernel")
import torchvision.transforms as transforms
from torchvision import models

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dtype  = torch.float32
torch.__version__
/home/billbrod/miniconda3/envs/plen_3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[1]:
'2.0.1+cu117'

Translation

[2]:
image_size = 64
einstein = po.data.einstein()
einstein = po.tools.conv.blur_downsample(einstein, n_scales=2)
vid = po.tools.translation_sequence(einstein, n_steps=20)
vid = po.tools.center_crop(vid, image_size // 2)
vid = po.tools.rescale(vid, 0, 1)

imgA = vid[0:1]
imgB = vid[-1:]

pt.image_stats(to_numpy(imgA))
pt.image_stats(to_numpy(imgB))
print(imgA.shape)
print(vid.shape)

# convention: full name for numpy arrays, short hands for torch tensors
video = to_numpy(vid).squeeze()
print(video.shape)
pt.imshow(list(video.squeeze()), zoom=4, col_wrap=6);
Image statistics:
  Range: [0.079997, 1.000000]
  Mean: 0.488417,  Stdev: 0.149090,  Kurtosis: 3.337172
Image statistics:
  Range: [0.000000, 0.741736]
  Mean: 0.354389,  Stdev: 0.212748,  Kurtosis: 1.725743
torch.Size([1, 1, 32, 32])
torch.Size([21, 1, 32, 32])
(21, 32, 32)
_images/tutorials_intro_05_Geodesics_3_1.png
Spectral models

Computing a geodesic to reveal excess invariance of the global Fourier magnitude representation.

[3]:
import torch.fft
class Fourier(nn.Module):
    def __init__(self, representation = 'amp'):
        super().__init__()
        self.representation = representation

    def spectrum(self, x):
        return torch.fft.rfftn(x, dim=(2, 3))

    def forward(self, x):
        if self.representation == 'amp':
            return torch.abs(self.spectrum(x))
        elif self.representation == 'phase':
            return torch.angle(self.spectrum(x))
        elif self.representation == 'rectangular':
            return self.spectrum(x)
        elif self.representation == 'polar':
            return torch.cat((torch.abs(self.spectrum(x)),
                              torch.angle(self.spectrum(x))),
                             dim=1)

model = Fourier('amp')
# model = Fourier('polar') # note: need pytorch>=1.8 to take gradients through torch.angle
[4]:
n_steps = len(video)-1
moog = po.synth.Geodesic(imgA, imgB, model, n_steps, initial_sequence='bridge')
optim = torch.optim.Adam([moog._geodesic], lr=.01, amsgrad=True)
moog.synthesize(max_iter=500, optimizer=optim, store_progress=True)
/home/billbrod/Documents/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
  warnings.warn(

 Stop criterion for pixel_change_norm = 1.07149e-02
 24%|████▎             | 119/500 [00:00<00:01, 202.78it/s, loss=8.3254e+01, gradient norm=1.0443e-01, pixel change norm=1.08467e-03]/home/billbrod/Documents/plenoptic/src/plenoptic/synthesize/geodesic.py:193: UserWarning: Pixel change norm has converged, stopping synthesis
  warnings.warn("Pixel change norm has converged, stopping synthesis")
 26%|████▋             | 130/500 [00:00<00:01, 204.98it/s, loss=8.3254e+01, gradient norm=1.0443e-01, pixel change norm=1.08467e-03]
[5]:
fig, axes = plt.subplots(2, 1, figsize=(5, 8))
po.synth.geodesic.plot_loss(moog, ax=axes[0]);
po.synth.geodesic.plot_deviation_from_line(moog, vid, ax=axes[1]);
_images/tutorials_intro_05_Geodesics_7_0.png
[6]:
plt.plot(po.to_numpy(moog.step_energy), alpha=.2);
plt.plot(moog.step_energy.mean(1), 'r-', label='path energy')
plt.axhline(torch.linalg.vector_norm(moog.model(moog.image_a) - moog.model(moog.image_b), ord=2) ** 2 / moog.n_steps ** 2)
plt.legend()
plt.title('evolution of representation step energy')
plt.ylabel('step energy')
plt.xlabel('iteration')
plt.yscale('log')
plt.show()
_images/tutorials_intro_05_Geodesics_8_0.png
[7]:
plt.plot(moog.calculate_jerkiness().detach())
plt.title('final representation step jerkiness')
[7]:
Text(0.5, 1.0, 'final representation step jerkiness')
_images/tutorials_intro_05_Geodesics_9_1.png
[8]:
plt.plot(po.to_numpy(moog.dev_from_line[..., 1]));

plt.title('evolution of distance from representation line')
plt.ylabel('distance from representation line')
plt.xlabel('iteration step')
plt.show()
_images/tutorials_intro_05_Geodesics_10_0.png
[9]:
pixelfade = to_numpy(moog.pixelfade.squeeze())
geodesic = to_numpy(moog.geodesic.squeeze())
fig = pt.imshow([video[5], pixelfade[5], geodesic[5]],
          title=['video', 'pixelfade', 'geodesic'],
          col_wrap=3, zoom=4);

size = geodesic.shape[-1]
h, m , l = (size//2 + size//4, size//2, size//2 - size//4)

# for a in fig.get_axes()[0]:
a = fig.get_axes()[0]
for line in (h, m, l):
    a.axhline(line, lw=2)

pt.imshow([video[:,l], pixelfade[:,l], geodesic[:,l]],
          title=None, col_wrap=3, zoom=4);
pt.imshow([video[:,m], pixelfade[:,m], geodesic[:,m]],
          title=None, col_wrap=3, zoom=4);
pt.imshow([video[:,h], pixelfade[:,h], geodesic[:,h]],
          title=None, col_wrap=3, zoom=4);
_images/tutorials_intro_05_Geodesics_11_0.png
_images/tutorials_intro_05_Geodesics_11_1.png
_images/tutorials_intro_05_Geodesics_11_2.png
_images/tutorials_intro_05_Geodesics_11_3.png
Physiologically inspired models
[10]:
model = po.simul.OnOff(kernel_size=(31,31), pretrained=True)
po.tools.remove_grad(model)
po.imshow(model(imgA), zoom=8);
/home/billbrod/Documents/plenoptic/src/plenoptic/simulate/models/frontend.py:388: UserWarning: pretrained is True but cache_filt is False. Set cache_filt to True for efficiency unless you are fine-tuning.
  warn("pretrained is True but cache_filt is False. Set cache_filt to "
/home/billbrod/miniconda3/envs/plen_3.10/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
_images/tutorials_intro_05_Geodesics_13_1.png
[11]:
n_steps = 10
moog = po.synth.Geodesic(imgA, imgB, model, n_steps, initial_sequence='bridge')
[12]:
moog.synthesize(store_progress=True)

 Stop criterion for pixel_change_norm = 7.76675e-03
 18%|███▏              | 178/1000 [00:19<01:31,  9.00it/s, loss=6.3326e-03, gradient norm=3.1056e-05, pixel change norm=5.54276e-03]
[13]:
fig, axes = plt.subplots(2, 1, figsize=(5, 8))
po.synth.geodesic.plot_loss(moog, ax=axes[0]);
po.synth.geodesic.plot_deviation_from_line(moog, ax=axes[1]);
_images/tutorials_intro_05_Geodesics_16_0.png
[14]:
plt.plot(po.to_numpy(moog.dev_from_line[...,0]))

plt.title('evolution of distance from representation line')
plt.ylabel('distance from representation line')
plt.xlabel('iteration step')
plt.yscale('log')
plt.show()
_images/tutorials_intro_05_Geodesics_17_0.png
[15]:
plt.plot(po.to_numpy(moog.step_energy), alpha=.2);
plt.plot(moog.step_energy.mean(1), 'r-', label='path energy')
plt.axhline(torch.linalg.vector_norm(moog.model(moog.image_a) - moog.model(moog.image_b), ord=2) ** 2 / moog.n_steps ** 2)
plt.legend()
plt.title('evolution of representation step energy')
plt.ylabel('step energy')
plt.xlabel('iteration')
plt.yscale('log')
plt.show()
_images/tutorials_intro_05_Geodesics_18_0.png
[16]:
plt.plot(moog.calculate_jerkiness().detach())
plt.title('final representation step jerkiness')
[16]:
Text(0.5, 1.0, 'final representation step jerkiness')
_images/tutorials_intro_05_Geodesics_19_1.png
[17]:
geodesic  = po.to_numpy(moog.geodesic).squeeze()
pixelfade = po.to_numpy(moog.pixelfade).squeeze()
assert geodesic.shape == pixelfade.shape
geodesic.shape
[17]:
(11, 32, 32)
[18]:
print('geodesic')
pt.imshow(list(geodesic), vrange='auto1', title=None, zoom=4);
print('diff')
pt.imshow(list(geodesic - pixelfade), vrange='auto1', title=None, zoom=4);
print('pixelfade')
pt.imshow(list(pixelfade), vrange='auto1', title=None, zoom=4);
geodesic
diff
pixelfade
_images/tutorials_intro_05_Geodesics_21_1.png
_images/tutorials_intro_05_Geodesics_21_2.png
_images/tutorials_intro_05_Geodesics_21_3.png
[19]:
# checking that the range constraint is met
plt.hist(video.flatten(), histtype='step', density=True, label='video')
plt.hist(pixelfade.flatten(), histtype='step', density=True, label='pixelfade')
plt.hist(geodesic.flatten(), histtype='step', density=True, label='geodesic');
plt.title('signal value histogram')
plt.legend(loc=1)
plt.show()
_images/tutorials_intro_05_Geodesics_22_0.png

vgg16 translation / rotation / scaling

[20]:
# We have some optional example images that we'll download for this. In order to do so,
# we use an optional dependency, pooch. If the following raises an ImportError or ModuleNotFoundError for you,
# then install pooch in your plenoptic environment and restart your kernel.
sample_image_dir = po.data.fetch_data('sample_images.tar.gz')
imgA = po.load_images(sample_image_dir / 'frontwindow_affine.jpeg', as_gray=False)
imgB = po.load_images(sample_image_dir / 'frontwindow.jpeg', as_gray=False)
u = 300
l = 90
imgA = imgA[..., u:u+224, l:l+224]
imgB = imgB[..., u:u+224, l:l+224]
po.imshow([imgA, imgB], as_rgb=True);
diff = imgA - imgB
po.imshow(diff);
pt.image_compare(po.to_numpy(imgA, True), po.to_numpy(imgB, True));
Difference statistics:
  Range: [0, 0]
  Mean: -0.012635,  Stdev (rmse): 0.208685,  SNR (dB): 0.856129
_images/tutorials_intro_05_Geodesics_24_1.png
_images/tutorials_intro_05_Geodesics_24_2.png
[21]:
from torchvision import models
# Create a class that takes the nth layer output of a given model
class NthLayer(torch.nn.Module):
    """Wrap any model to get the response of an intermediate layer

    Works for Resnet18 or VGG16.

    """
    def __init__(self, model, layer=None):
        """
        Parameters
        ----------
        model: PyTorch model
        layer: int
            Which model response layer to output
        """
        super().__init__()

        # TODO
        # is centrering appropriate???
        self.normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                              std=[0.229, 0.224, 0.225])
        try:
            # then this is VGG16
            features = list(model.features)
        except AttributeError:
            # then it's resnet18
            features = ([model.conv1, model.bn1, model.relu, model.maxpool] + [l for l in model.layer1] +
                        [l for l in model.layer2] + [l for l in model.layer3] + [l for l in model.layer4] +
                        [model.avgpool, model.fc])
        self.features = nn.ModuleList(features).eval()

        if layer is None:
            layer = len(self.features)
        self.layer = layer

    def forward(self, x):

        x = self.normalize(x)
        for ii, mdl in enumerate(self.features):
            x = mdl(x)
            if ii == self.layer:
                return x

# different potential models of human visual perception of distortions
# resnet18 = NthLayer(models.resnet18(pretrained=True), layer=3)

# choosing what layer representation to study
# for l in range(len(models.vgg16().features)):
#     print(f'({l}) ', models.vgg16().features[l])
#     y = NthLayer(models.vgg16(pretrained=True), layer=l)(imgA)
#     print("dim", torch.numel(y), "shape ", y.shape,)

vgg_pool1 = NthLayer(models.vgg16(pretrained=True), layer=4)
po.tools.remove_grad(vgg_pool1)
vgg_pool2 = NthLayer(models.vgg16(pretrained=True), layer=9)
po.tools.remove_grad(vgg_pool2)
vgg_pool3 = NthLayer(models.vgg16(pretrained=True), layer=17)
po.tools.remove_grad(vgg_pool3)
/home/billbrod/miniconda3/envs/plen_3.10/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/billbrod/miniconda3/envs/plen_3.10/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
[22]:
predA = po.to_numpy(models.vgg16(pretrained=True)(imgA))[0]
predB = po.to_numpy(models.vgg16(pretrained=True)(imgB))[0]

plt.plot(predA);
plt.plot(predB);
_images/tutorials_intro_05_Geodesics_26_0.png

The following block runs curl (which should be automatically installed on your system) to download a txt file containing the ImageNet class labels. If it doesn’t run for some reason, you can download it yourself from here and place it at ../data/imagenet1000_clsidx_to_labels.txt.

[23]:
!curl https://gist.githubusercontent.com/yrevar/942d3a0ac09ec9e5eb3a/raw/238f720ff059c1f82f368259d1ca4ffa5dd8f9f5/imagenet1000_clsidx_to_labels.txt -o ../data/imagenet1000_clsidx_to_labels.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 30564  100 30564    0     0  81532      0 --:--:-- --:--:-- --:--:-- 81721
[24]:
with open("../data/imagenet1000_clsidx_to_labels.txt") as f:
    idx2label = eval(f.read())

for idx in np.argsort(predA)[-5:]:
    print(idx2label[idx])
for idx in np.argsort(predB)[-5:]:
    print(idx2label[idx])
African elephant, Loxodonta africana
dam, dike, dyke
lakeside, lakeshore
water buffalo, water ox, Asiatic buffalo, Bubalus bubalis
valley, vale
alp
American black bear, black bear, Ursus americanus, Euarctos americanus
water buffalo, water ox, Asiatic buffalo, Bubalus bubalis
valley, vale
lakeside, lakeshore
[25]:
moog = po.synth.Geodesic(imgA, imgB, vgg_pool3)
[26]:
# this should be run for longer on a GPU
moog.synthesize(max_iter=25)

 Stop criterion for pixel_change_norm = 1.23674e-01
100%|█████████████████████| 25/25 [01:29<00:00,  3.57s/it, loss=3.9520e+05, gradient norm=2.8781e+04, pixel change norm=3.15347e-01]
[27]:
fig, axes = plt.subplots(2, 1, figsize=(5, 8))
po.synth.geodesic.plot_loss(moog, ax=axes[0]);
po.synth.geodesic.plot_deviation_from_line(moog, ax=axes[1]);
_images/tutorials_intro_05_Geodesics_32_0.png
[28]:
plt.plot(moog.calculate_jerkiness().detach())
plt.title('final representation step jerkiness')
[28]:
Text(0.5, 1.0, 'final representation step jerkiness')
_images/tutorials_intro_05_Geodesics_33_1.png
[29]:
po.imshow(moog.geodesic, as_rgb=True, zoom=2, title=None, vrange='auto0');
po.imshow(moog.pixelfade, as_rgb=True, zoom=2, title=None, vrange='auto0');
# per channel difference
po.imshow([(moog.geodesic - moog.pixelfade)[1:-1, 0:1]], zoom=2, title=None, vrange='auto1');
po.imshow([(moog.geodesic - moog.pixelfade)[1:-1, 1:2]], zoom=2, title=None, vrange='auto1');
po.imshow([(moog.geodesic - moog.pixelfade)[1:-1, 2:]], zoom=2, title=None, vrange='auto1');
# exaggerated color difference
po.imshow([po.tools.rescale((moog.geodesic - moog.pixelfade)[1:-1])], as_rgb=True, zoom=2, title=None);
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
_images/tutorials_intro_05_Geodesics_34_1.png
_images/tutorials_intro_05_Geodesics_34_2.png
_images/tutorials_intro_05_Geodesics_34_3.png
_images/tutorials_intro_05_Geodesics_34_4.png
_images/tutorials_intro_05_Geodesics_34_5.png
_images/tutorials_intro_05_Geodesics_34_6.png

Metamers

Metamers are an old concept in the study of perception, dating back to the color-matching experiments in the 18th century that first provided support for the existence of three cone types (though it would be another two hundred years before anatomical evidence was found). These color-matching evidences demonstrated that, by combining three colored lights in different proportions, you could generate a color that humans perceived as identical to any other color, even though their physical spectra were different. Perceptual metamers, then, refer to two images that are physically different but perceived as identical.

For the purposes of plenoptic, wherever we say “metamers”, we mean “model metamers”: images that are physically different but have identical representation for a given model, i.e., that the model “perceives” as identical. Like all synthesis methods, it is model-specific, and one potential experiment is to determine if model metamers can serve as human percpetual metamers, which provides support for the model as an accurate representation of the human visual system.

In the Lab for Computational Vision, this goes back to Portilla and Simoncelli, 2001, where the authors created a parametric model of textures and synthesized novel images as a way of demonstrating the cases where the model succeeded and failed. In that paper, the model did purport to have anything to do with human vision, and they did not refer to their images as “metamers”, that term did not appear until Freeman and Simoncelli, 2011, where the authors pool the Portilla and Simoncelli texture statistics in windows laid out in a log-polar fashion to generate putative human perceptual metamers.

This notebook demonstrates how to use the Metamer class to generate model metamers.

[1]:
import plenoptic as po
from plenoptic.tools import to_numpy
import imageio
import torch
import pyrtools as pt
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
import numpy as np

%load_ext autoreload
%autoreload 2
/mnt/home/wbroderick/miniconda3/envs/plenoptic/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Basic usage

As with all our synthesis methods, we start by grabbing a target image and initalizing our model.

[2]:
img = po.data.curie()
po.imshow(img);
_images/tutorials_intro_06_Metamer_3_0.png

For the model, we’ll use a simple On-Off model of visual neurons

[3]:
model = po.simul.OnOff((7, 7))
po.tools.remove_grad(model)

Like all of our models, when this is called on the image, it returns a 3d or 4d tensor (in this case, 4d). This representation is what the Metamer class will try to match.

[4]:
print(model(img))
tensor([[[[0.6907, 0.6915, 0.6930,  ..., 0.6935, 0.6935, 0.6937],
          [0.6917, 0.6923, 0.6934,  ..., 0.6934, 0.6936, 0.6939],
          [0.6935, 0.6937, 0.6939,  ..., 0.6933, 0.6939, 0.6942],
          ...,
          [0.6927, 0.6928, 0.6934,  ..., 0.6928, 0.6939, 0.6943],
          [0.6944, 0.6943, 0.6941,  ..., 0.6930, 0.6936, 0.6938],
          [0.6950, 0.6948, 0.6942,  ..., 0.6929, 0.6932, 0.6933]],

         [[0.6951, 0.6944, 0.6933,  ..., 0.6928, 0.6928, 0.6926],
          [0.6943, 0.6938, 0.6929,  ..., 0.6929, 0.6927, 0.6925],
          [0.6929, 0.6927, 0.6926,  ..., 0.6930, 0.6924, 0.6922],
          ...,
          [0.6935, 0.6934, 0.6930,  ..., 0.6934, 0.6926, 0.6923],
          [0.6921, 0.6922, 0.6924,  ..., 0.6933, 0.6928, 0.6926],
          [0.6917, 0.6918, 0.6923,  ..., 0.6933, 0.6931, 0.6930]]]])
/mnt/home/wbroderick/miniconda3/envs/plenoptic/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

In order to visualize this, we can use the helper function plot_representation (see Display notebook for more details here). In this case, the representation looks like two images, and so we plot it as such:

[5]:
po.tools.display.plot_representation(data=model(img), figsize=(11, 5));
_images/tutorials_intro_06_Metamer_9_0.png

At the simplest, to use Metamer, simply initialize it with the target image and the model, then call .synthesize(). By setting store_progress=True, we update a variety of attributes (all of which start with saved_) on each iteration so we can later examine, for example, the synthesized image over time.

[6]:
met = po.synth.Metamer(img, model)
met.synthesize(store_progress=True, max_iter=50)
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
  warnings.warn(
100%|██████████| 50/50 [00:00<00:00, 50.57it/s, loss=9.4442e-06, learning_rate=0.01, gradient_norm=7.7735e-04, pixel_change_norm=3.8657e-01]

We then call the plot_synthesis_status function to see how things are doing. The image on the left shows the metamer at this moment, while the center plot shows the loss over time, with the red dot pointing out the current loss, and the rightmost plot shows the representation error. If a model has a plot_representation method, this plot can be more informative, but this plot can always be created.

[7]:
# model response error plot has two subplots, so we increase its relative width
po.synth.metamer.plot_synthesis_status(met, width_ratios={'plot_representation_error': 2});
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/display.py:950: UserWarning: ax is not None, so we're ignoring figsize...
  warnings.warn("ax is not None, so we're ignoring figsize...")
_images/tutorials_intro_06_Metamer_13_1.png

plot_synthesis_status() is a helper function to show all of this at once, but the individual components can be created separately:

[8]:
fig, axes = plt.subplots(1, 3, figsize=(25, 5), gridspec_kw={'width_ratios': [1, 1, 2]})
po.synth.metamer.display_metamer(met, ax=axes[0])
po.synth.metamer.plot_loss(met, ax=axes[1])
po.synth.metamer.plot_representation_error(met, ax=axes[2]);
_images/tutorials_intro_06_Metamer_15_0.png

The loss is decreasing, but clearly there’s much more to go. So let’s continue.

You can resume synthesis as long as you pass the same argument to store_progress on each run (several other arguments, such as optimizer and scheduler, must be None on any run except the first).

Everything that stores the progress of the optimization (loss, saved_model_response, saved_signal) will persist between calls and so potentially get very large.

[9]:
met.synthesize(store_progress=True, max_iter=100)
 12%|█▏        | 12/100 [00:00<00:01, 52.64it/s, loss=7.5078e-06, learning_rate=0.01, gradient_norm=8.2988e-04, pixel_change_norm=2.7297e-01]/mnt/home/wbroderick/plenoptic/src/plenoptic/synthesize/metamer.py:195: UserWarning: Loss has converged, stopping synthesis
  warnings.warn("Loss has converged, stopping synthesis")
 14%|█▍        | 14/100 [00:00<00:01, 49.28it/s, loss=7.5078e-06, learning_rate=0.01, gradient_norm=8.2988e-04, pixel_change_norm=2.7297e-01]

Let’s examine the status again. But instead of looking at the most recent status, let’s look at 10 from the end:

[10]:
po.synth.metamer.plot_synthesis_status(met, iteration=-10, width_ratios={'plot_representation_error': 2});
_images/tutorials_intro_06_Metamer_19_0.png

Since we have the ability to select which iteration to plot (as long as we’ve been storing the information), we can create an animation showing the synthesis over time. The matplotlib.animation object that gets returned can’t be viewed directly, it either has to be converted to html for display in the notebook (using the convert_anim_to_html function we provide) or saved as some video format (e.g., anim.save('test.mp4'), which requires ffmpeg to be installed and on your path.

[11]:
anim = po.synth.metamer.animate(met, width_ratios={'plot_representation_error': 2})
po.tools.convert_anim_to_html(anim)
/mnt/home/wbroderick/plenoptic/src/plenoptic/synthesize/metamer.py:1645: UserWarning: Looks like representation is image-like, haven't fully thought out how to best handle rescaling color ranges yet!
  warnings.warn("Looks like representation is image-like, haven't fully thought out how"
[11]:

Generally speaking, synthesis will run until you hit max_iter iterations. However, synthesis can also stop if it looks like the loss has stopped changing. This behavior is controlled with the loss_thresh and loss_change_iter arguments: if the loss has changed by less than loss_thresh over the past loss_change_iter iterations, we stop synthesis.

Moving between devices

Metamer has a .to() method for moving the object between devices or dtypes. Call it as you would call any tensor.to and it will move over the necessary attributes.

Saving and loading

Finally, you probably want to save the results of your synthesis. As mentioned above, you can save the synthesis animation, and all of the plots return regular matplotlib Figures and can be manipulated as expected. The synthesized image itself is a tensor and can be detached, converted to a numpy array, and saved (either as an image or array) as you’d expect. po.to_numpy is a convenience function we provide for stuff like this, which detaches the tensor, sends it to the CPU, and converts it to a numpy array with dtype float32. Note that it doesn’t squeeze the tensor, so you may want to do that yourself.

[12]:
met_image = po.to_numpy(met.metamer).squeeze()
# convert from array to int8 for saving as an image
print(f'Metamer range: ({met_image.min()}, {met_image.max()})')
met_image = po.tools.convert_float_to_int(np.clip(met_image, 0, 1))
imageio.imwrite('test.png', met_image)
Metamer range: (-0.00023865862749516964, 1.0005061626434326)

The metamer lies slightly outside the range [0, 1], so we clip before saving as an image. Metamer’s objective function has a quadratic penalty on the synthesized image’s range, and the weight on this penalty can be adjusted by changing the value of range_penalty_lambda at initialization.

You can also save the entire Metamer object. This can be fairly large (depending on how many iterations you ran it for and how frequently you stored progress), but stores all information:

[13]:
met.save('test.pt')

You can then load it back in using the method .load(). Note that you need to first instantiate the Metamer object and then call .load() – it must be instantiated with the same image, model, and loss function in order to load it in!

[14]:
met_copy = po.synth.Metamer(img, model)
# it's modified in place, so this method doesn't return anything
met_copy.load('test.pt')
(met_copy.saved_metamer == met.saved_metamer).all()
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
  warnings.warn(
[14]:
tensor(True)

Because the model itself can be quite large, we do not save it along with the Metamer object. This is why you must initialize it before loading from disk.

Reproducibility

You can set the seed before you call synthesize() for reproducibility by using po.tools.set_seed(). This will set both the pytorch and numpy seeds, but note that we can’t guarantee complete reproducibility: see pytroch docs for some caveats (we currently do not do the stuff described under CuDNN), as well as this issue about resuming state after saving.

Also note that pytorch does not guarantee identical results between CPU and GPU, even with the same seed.

More Advanced Options

The solution found by the end of the Basic usage section is only one possible metamer. UNTRUE: For this model, it was relatively easy to find a decent metamer (as seen by the relatively low loss and representation error), but that’s not always the case. Optimization in a high-dimensional space with non-linear models is inherently challenging and so we can’t guarantee you’ll find a model metamer, but we do provide some tools / extra functionality to help.

Initialization

By default, the initial_image arg when initializing Metamer is None, in which case we initialize with uniformly-distributed random noise between 0 and 1. If you wish to use some other image for initialization, you can initialize it yourself (it must be the same shape as target_signal) and pass it as the initial_image arg.

Optimization basics

You can set all the various optimization parameters you’d expect. synthesize() has an optimizer arg, which accepts a pytorch optimizer. You can therefore initialize your own optimizer after initializing Metamer like so

[15]:
met = po.synth.Metamer(img, model)
opt = torch.optim.Adam([met.metamer], lr=.001, amsgrad=True)
met.synthesize(optimizer=opt)
 48%|████▊     | 48/100 [00:00<00:00, 55.27it/s, loss=8.6771e-05, learning_rate=0.001, gradient_norm=2.4843e-04, pixel_change_norm=1.3648e-01]/mnt/home/wbroderick/plenoptic/src/plenoptic/synthesize/metamer.py:195: UserWarning: Loss has converged, stopping synthesis
  warnings.warn("Loss has converged, stopping synthesis")
 50%|█████     | 50/100 [00:00<00:00, 54.20it/s, loss=8.6771e-05, learning_rate=0.001, gradient_norm=2.4843e-04, pixel_change_norm=1.3648e-01]

synthesize() also accepts a scheduler argument, so that you can pass a pytorch scheduler, which modifies the learning rate during optimization.

Coarse-to-fine optimization

Some models, such as the Portilla-Simoncelli texture statistics, have a multiscale representation of the image, which can complicate the optimization. It’s generally recommended that you normalize the representation (or use a specific loss function) so that the different scales all contribute equally to the representation, but that’s out of the scope of this notebook.

We provide the option to use coarse-to-fine optimization, such that you optimize the different scales separately (starting with the coarsest and then moving progressively finer) and then, at the end, optimizing all of them simultaneously. This was first used in Portilla and Simoncelli, 2000, and can help avoid local optima in image space. Unlike everything else described in this notebook, it will not work for all models. There are two specifications the model must meet:

  1. It must have a scales attribute that gives the scales in the order they should be optimized.

  2. Its forward() method must accept a scales keyword argument, which accpets a list and causes the model to return only the scale(s) included. See PortillaSimoncelli.forward() for an example.

We can see that po.simul.PortillaSimoncelli satisfies these constraints, and that the model returns a subset of its output when the scales argument is passed to forward()

[16]:
# we change images to a texture, which the PS model can do a good job capturing
img = po.data.reptile_skin()
ps = po.simul.PortillaSimoncelli(img.shape[-2:])
print(ps.scales)
print(ps.forward(img).shape)
print(ps.forward(img, scales=[0]).shape)
['pixel_statistics', 'residual_lowpass', 3, 2, 1, 0, 'residual_highpass']
torch.Size([1, 1, 1046])
torch.Size([1, 1, 261])

There are two choices for how to handle coarse-to-fine optimization: 'together' or 'separate'. In 'together' (recommended), we start with the coarsest scale and then gradually add each finer scale (thi sis like blurring the objective function and then gradually adding details). In 'separate', we compute the gradient with respect to each scale separately (ignoring the others), then with respect to all of them at the end.

If our model meets the above requirements, then we can use the MetamerCTF class, which uses this coarse-to-fine procedure. We specify which of the two above options are used during initialization, and it will work through the scales as described above (and will resume correctly if you resume synthesis). Note that this will take a while, as it has to go through each scale. Also note that the progress bar now specifies which scale we’re on.

[17]:
met = po.synth.MetamerCTF(img, ps, loss_function=po.tools.optim.l2_norm, coarse_to_fine='together')
met.synthesize(store_progress=True, max_iter=100)
# we don't show our synthesized image here, because it hasn't gone through all the scales, and so hasn't finished synthesizing
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:211: UserWarning: Validating whether model can work with coarse-to-fine synthesis -- this can take a while!
  warnings.warn("Validating whether model can work with coarse-to-fine synthesis -- this can take a while!")
100%|██████████| 100/100 [00:13<00:00,  7.42it/s, loss=7.7466e+00, learning_rate=0.01, gradient_norm=4.9135e-01, pixel_change_norm=2.1053e+00, current_scale=residual_lowpass, current_scale_loss=1.3752e+00]

In order to control when synthesis considers a scale to be “done” and move on to the next one, you can set two arguments: change_scale_criterion and ctf_iters_to_check. If the scale-specific loss (current_scale_loss in the progress bar above) has changed by less than change_scale_criterion over the past ctf_iters_to_check iterations, we consider that scale to have reached a local optimum and move on to the next. You can also set change_scale_criterion=None, in which case we always shift scales after ctf_iters_to_check iterations

[18]:
# initialize with some noise that is approximately mean-matched and with low variance
im_init = torch.rand_like(img) * .1 + img.mean()
met = po.synth.MetamerCTF(img, ps, loss_function=po.tools.optim.l2_norm, initial_image=im_init, coarse_to_fine='together', )
met.synthesize(store_progress=10, max_iter=500,
               change_scale_criterion=None, ctf_iters_to_check=7)
po.imshow([met.image, met.metamer], title=['Target image', 'Synthesized metamer'], vrange='auto1');
100%|██████████| 500/500 [00:47<00:00, 10.60it/s, loss=1.5794e-01, learning_rate=0.01, gradient_norm=1.2401e+00, pixel_change_norm=1.7646e-01, current_scale=all, current_scale_loss=1.5794e-01]
_images/tutorials_intro_06_Metamer_37_1.png

And we can see these shfits happening in the animation of synthesis:

[19]:
po.tools.convert_anim_to_html(po.synth.metamer.animate(met))
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/display.py:950: UserWarning: ax is not None, so we're ignoring figsize...
  warnings.warn("ax is not None, so we're ignoring figsize...")
[19]:

MetamerCTF has several attributes which are used in the course of coarse-to-fine synthesis:

  • scales_loss: this list contains the scale-specific loss at each iteration (that is, the loss computed on just the scale(s) we’re optimizing on that iteration; which we use to determine when to switch scales).

  • scales: this is a list of the scales in optimization order (i.e., from coarse to fine). The last entry will be 'all' (since after we’ve optimized each individual scale, we move on to optimizing all at once). This attribute will be modified by the synthesize() method and is used to track which scale we’re currently optimizing (the first one). When we’ve gone through all the scales present, this will just contain a single value: 'all'.

  • scales_timing: this is a dictionary whose keys are the values of scales. The values are lists, with 0 through 2 entries: the first entry is the iteration where we started optimizing this scale, the second is when we stopped (thus if it’s an empty list, we haven’t started optimzing it yet).

  • scales_finished: this is a list of the scales that we’ve finished optimizing (in the order we’ve finished). The union of this and scales will be the same as metamer.model.scales.

A small wrinkle: if coarse_to_fine=='together', then none of these will ever contain the final, finest scale, since that is equivalent to 'all'.

MAD Competition Conceptual Introduction

This notebook shows the simplest possible MAD: a two pixel image, where our models are L2-norm and L1-norm. It will not explain the basics of MAD Competition or how to use it. Instead, since we’re dealing with a simple and low-dimensional example, we can plot the image in pixel space and draw out the model contours, which we can use to explicitly check whether we’ve found the correct results.

[1]:
import plenoptic as po
from plenoptic.tools import to_numpy
import torch
import pyrtools as pt
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
import numpy as np
import itertools

%load_ext autoreload
%autoreload 2
/home/billbrod/miniconda3/envs/plenoptic/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

First we pick our metrics and run our synthesis. We create four different MADCompetition instances, in order to create the full set of images.

[2]:
img = torch.tensor([.5, .5], dtype=torch.float32).reshape((1, 1, 1, 2))
def l1_norm(x, y):
    return torch.norm(x-y, 1)
metrics = [po.tools.optim.l2_norm, l1_norm]
all_mad = {}

# this gets us all four possibilities
for t, (m1, m2) in itertools.product(['min', 'max'], zip(metrics, metrics[::-1])):
    name = f'{m1.__name__}_{t}'
    # we set the seed like this to ensure that all four MADCompetition instances have the same initial_signal. Try different seed values!
    po.tools.set_seed(10)
    all_mad[name] = po.synth.MADCompetition(img, m1, m2, t, metric_tradeoff_lambda=1e4)
    optim = torch.optim.Adam([all_mad[name].mad_image], lr=.0001)
    print(f"Synthesizing {name}")
    all_mad[name].synthesize(store_progress=True, max_iter=2000, optimizer=optim, stop_criterion=1e-10)

# double-check that these are all equal.
assert all([torch.allclose(all_mad['l2_norm_min'].initial_image, v.initial_image) for v in all_mad.values()])
Synthesizing l2_norm_min
 95%|███████████████████████████████████████▉  | 1903/2000 [00:05<00:00, 259.90it/s, loss=1.0817e-01, learning_rate=0.0001, gradient_norm=7.4119e-04, pixel_change_norm=3.6378e-07, reference_metric=1.5296e-01, optimized_metric=1.0816e-01]/home/billbrod/Documents/plenoptic/plenoptic/synthesize/mad_competition.py:445: UserWarning: Loss has converged, stopping synthesis
  warnings.warn("Loss has converged, stopping synthesis")
 95%|███████████████████████████████████████▉  | 1904/2000 [00:05<00:00, 339.19it/s, loss=1.0817e-01, learning_rate=0.0001, gradient_norm=7.4119e-04, pixel_change_norm=3.6378e-07, reference_metric=1.5296e-01, optimized_metric=1.0816e-01]
Synthesizing l1_norm_min
100%|██████████████████████████████████████████| 2000/2000 [00:06<00:00, 312.77it/s, loss=1.2641e-01, learning_rate=0.0001, gradient_norm=1.0004e+00, pixel_change_norm=1.5457e-05, reference_metric=1.2638e-01, optimized_metric=1.2639e-01]
Synthesizing l2_norm_max
 64%|██████████████████████████▎              | 1282/2000 [00:04<00:02, 289.04it/s, loss=-1.5302e-01, learning_rate=0.0001, gradient_norm=9.9836e-01, pixel_change_norm=5.5730e-06, reference_metric=1.5305e-01, optimized_metric=1.5304e-01]
Synthesizing l1_norm_max
 79%|████████████████████████████████▌        | 1587/2000 [00:03<00:00, 413.92it/s, loss=-1.7886e-01, learning_rate=0.0001, gradient_norm=1.1160e-03, pixel_change_norm=3.8166e-07, reference_metric=1.2651e-01, optimized_metric=1.7891e-01]

(The red progress bars show that we hit our stop criterion and broke out of the loop early, not that anything went wrong.)

Now let’s visualize our metrics for these four instances:

[3]:
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
pal = {'l1_norm': 'C0', 'l2_norm': 'C1'}
for ax, (k, mad) in zip(axes.flatten(), all_mad.items()):
    ax.plot(mad.optimized_metric_loss, pal[mad.optimized_metric.__name__], label=mad.optimized_metric.__name__)
    ax.plot(mad.reference_metric_loss, pal[mad.reference_metric.__name__], label=mad.reference_metric.__name__)
    ax.set(title=k.capitalize().replace('_', ' '), xlabel='Iteration', ylabel='Loss')
ax.legend(loc='center left', bbox_to_anchor=(1.1, 1.1))
[3]:
<matplotlib.legend.Legend at 0x7f42cfacbb20>
_images/tutorials_intro_07_Simple_MAD_5_1.png

This looks pretty good – the L1 norm line is flat in the left column, while the L2 norm is flat in the right column, and the other line is either rising (in the bottom row) or falling (in the top).

Since our images only have two pixels, we can get a better sense of what’s going on by plotting them in pixel space: first pixel value on the x-axis, second on the y-axis. We can use this to visualize the points and how far they are from each other. We also know what the level curves look like for the \(L_1\) and \(L_2\) norms (a diamond and a circle centered on our reference image, respectively), so we can add them as well.

[4]:
l1 = to_numpy(torch.norm(all_mad['l2_norm_max'].image - all_mad['l2_norm_max'].initial_image, 1))
l2 = to_numpy(torch.norm(all_mad['l2_norm_max'].image - all_mad['l2_norm_max'].initial_image, 2))
ref = to_numpy(all_mad['l2_norm_max'].image.squeeze())
init = to_numpy(all_mad['l2_norm_max'].initial_image.squeeze())

def circle(origin, r, n=1000):
    theta = 2*np.pi/n*np.arange(0, n+1)
    return np.array([origin[1]+r*np.cos(theta), origin[0]+r*np.sin(theta)])
def diamond(origin, r, n=1000):
    theta = 2*np.pi/n*np.arange(0, n+1)
    rotation = np.pi/4
    square_correction = (np.abs(np.cos(theta-rotation)-np.sin(theta-rotation)) + np.abs(np.cos(theta-rotation)+np.sin(theta-rotation)))
    square_correction /= square_correction[0]
    r = r / square_correction
    return np.array([origin[1]+r*np.cos(theta), origin[0]+r*np.sin(theta)])
l2_level_set = circle(ref, l2,)
l1_level_set = diamond(ref, l1)

We can see in the following plot that it is doing the right thing, but it’s very hard to separate these two metrics. We’ve styled the points below so that their color matches the contour level they’re supposed to lie on (i.e., the fixed metric), and so that a hollow point shows the target was to minimize, and a solid one to maximize.

[5]:
fig, ax = plt.subplots(1, 1, figsize=(5, 5))
ax.scatter(*ref, label='reference', c='r', s=100)
ax.scatter(*init, label='initial', c='k', s=100)
ax.plot(*l1_level_set, pal['l1_norm']+'--', label='L1 norm level set')
ax.plot(*l2_level_set, pal['l2_norm']+'--', label='L2 norm level set')
for k, v in all_mad.items():
    ec = pal[v.reference_metric.__name__]
    fc = 'none' if 'min' in k else ec
    ax.scatter(*v.mad_image.squeeze().detach(), fc=fc, ec=ec, label=k)
plt.legend(bbox_to_anchor=(1.04,1), loc="upper left")
[5]:
<matplotlib.legend.Legend at 0x7f42cebfe640>
_images/tutorials_intro_07_Simple_MAD_9_1.png

In the above plot, the red dot in the middle is our reference signal and the black dot shows the initial signal. The new points we synthesized will either have the same L1 or L2 norm distance with the reference signal as that initial signal (the other distance will be minimized or maximized). Let’s look at the solid blue dot first: this point has had its L2-norm maximized, while holding the L1 norm constant. We can see, therefore, that it lies along the L1-norm level set (the diamond) while moving as far away from the red dot as possible, which puts it in the corner of the diamond. This means that one of the pixels has the same value as the reference, while the other is as different as possible. Note that the other corners of the diamond would work equally well, but our initial point put us closest to this one.

Conversely, the solid orange dot is maximizing L1 norm (and holding L2 norm constant), so it lies along the L2 norm level set while moving as far away from the red dot as possible, which puts it along a diagonal away from the red dot. This means that neither pixel has the same value as the reference, they’re both an intermediate value that has the same absolute difference from the reference point, as we can verify below:

[6]:
all_mad['l1_norm_max'].mad_image - all_mad['l1_norm_max'].image
[6]:
tensor([[[[-0.0894, -0.0895]]]], grad_fn=<SubBackward0>)

Now, if we look at the hollow orange dot, which is minimizing L1 and holding L2 constant, we can see that it has similarly moved along the L1 level set but gotten as close to the reference as possible, which puts it “along the axis” with the solid blue dot, just closer. Therefore, it has one pixel whose value matches that of the reference, and the other that is as close to the reference value as possible. Analogous logic holds for the hollow blue dot.

Generally, you’re working with metrics and signals where you can’t make the above plot to double-check the performance of MAD. Unfortunately, you’ll have to spend time playing with the various parameters in order to find what works best. The most important of these parameters is metric_tradeoff_lambda; you can see above that we set it to the very high value of 1e4 (if you try reducing this yourself, you’ll see the fixed metric doesn’t stay constant and the points in the bottom plot move towards the reference point in the center). In this case, all four values took the same value of metric_tradeoff_lambda, but in general that might not be true (for example, if one of your metrics returns much larger values than the other).

Full images!

We can, however, extend this L1 and L2 example to full images. Let’s give that a try on a checkerboard:

[7]:
def create_checkerboard(image_size, period, values=[0, 1]):
    image = pt.synthetic_images.square_wave(image_size, period=period)
    image += pt.synthetic_images.square_wave(image_size, period=period, direction=np.pi/2)
    image += np.abs(image.min())
    image /= image.max()
    return torch.from_numpy(np.where((image < .75) & (image > .25), *values[::-1])).unsqueeze(0).unsqueeze(0).to(torch.float32)

# by setting the image to lie between 0 and 255 and be slightly within the max possible range, we make the optimizatio a bit easier.
img = 255 * create_checkerboard((64, 64), 16, [.1, .9])
po.imshow(img, vrange=(0, 255), zoom=4);
# you could also do this with another natural image, give it a try!
_images/tutorials_intro_07_Simple_MAD_13_0.png

Now we’ll do the same process of running synthesis and checking our loss as above:

[8]:
def l1_norm(x, y):
    return torch.norm(x-y, 1)
metrics = [po.tools.optim.l2_norm, l1_norm]
tradeoffs = {'l2_norm_max': 1e-4, 'l2_norm_min': 1e-4,
             'l1_norm_max': 1e2, 'l1_norm_min': 1e3}

all_mad = {}

# this gets us all four possibilities
for t, (m1, m2) in itertools.product(['min', 'max'], zip(metrics, metrics[::-1])):
    name = f'{m1.__name__}_{t}'
    # we set the seed like this to ensure that all four MADCompetition instances have the same initial_signal. Try different seed values!
    po.tools.set_seed(0)
    all_mad[name] = po.synth.MADCompetition(img, m1, m2, t, metric_tradeoff_lambda=tradeoffs[name], initial_noise=20, allowed_range=(0, 255), range_penalty_lambda=1)
    optim = torch.optim.Adam([all_mad[name].mad_image], lr=.1)
    print(f"Synthesizing {name}")
    all_mad[name].synthesize(store_progress=True, max_iter=30000, optimizer=optim, stop_criterion=1e-10)

# double-check that these are all equal.
assert all([torch.allclose(all_mad['l2_norm_min'].initial_image, v.initial_image) for v in all_mad.values()])
Synthesizing l2_norm_min
  3%|█▌                                          | 1049/30000 [00:03<01:25, 339.55it/s, loss=9.6436e+02, learning_rate=0.1, gradient_norm=1.6492e-01, pixel_change_norm=3.3186e-01, reference_metric=6.1687e+04, optimized_metric=9.6386e+02]
Synthesizing l1_norm_min
100%|███████████████████████████████████████████| 30000/30000 [03:17<00:00, 152.15it/s, loss=6.5361e+03, learning_rate=0.1, gradient_norm=6.3860e+01, pixel_change_norm=6.7973e-01, reference_metric=1.1773e+03, optimized_metric=6.5345e+03]
Synthesizing l2_norm_max
 15%|▏| 4526/30000 [00:14<01:23, 306.11it/s, loss=-3.7644e+03, learning_rate=0.1, gradient_norm=4.7183e-01, pixel_ch
Synthesizing l1_norm_max
  8%| | 2536/30000 [00:08<01:28, 308.63it/s, loss=-7.5356e+04, learning_rate=0.1, gradient_norm=5.3381e-01, pixel_ch

We’re going to visualize these slightly different to the above, since they have such different scales. The left axis shows the L1 norm loss, while the right one shows the L2 norm loss. Each of the four lines is a separate synthesis target, with the colors the same as above (note that l1_norm_min looks like it hasn’t quite converged yet – you can decrease the stop_criterion value and increase max_iter above to let it run longer, but the above is sufficient for demonstrative purposes).

[9]:
po.synth.mad_competition.plot_loss_all(*all_mad.values());
_images/tutorials_intro_07_Simple_MAD_17_0.png

Now we’ll show all the synthesized MAD images. In the following, the top row shows the reference and initial images, then the MAD images:

[10]:
po.synth.mad_competition.display_mad_image_all(*all_mad.values(), zoom=4, vrange=(0, 255));
_images/tutorials_intro_07_Simple_MAD_19_0.png

If we go through them following the same logic as on the two-pixel case, we can see that our conclusions still hold. The following plots the difference between each of the above images and the reference image, to make the following points explicit:

  • Max L2 and min L1 mainly have pixels that have the same value as the reference image, and the rest are all extremal values, as different from the reference as possible. Max L2 has more of these pixels, and they have more extremal values.

  • Max L1 and min L2 pixels are all intermediate values, all the same absolute difference from the reference image. Max L1’s absolute difference is larger than min L2’s.

[11]:
keys = ['l2_norm_min', 'l2_norm_max', 'l1_norm_min', 'l1_norm_max']
po.imshow([all_mad[k].mad_image - all_mad[k].image for k in keys], title=keys,
          zoom=4, vrange='indep0', col_wrap=2);
_images/tutorials_intro_07_Simple_MAD_21_0.png

Finally, to connect this to perception, this implies that L2 is a better perceptual metric than L1, as L2’s two images are more perceptually distinct than L1’s and the salt and pepper noise found in L2 max results in a perceptually worse image than the mid-level gray values found in L2 min (L1 makes the opposite prediction, that the mid-level gray values are perceptually worse than the salt and pepper noise). In order to validate this, you’d want to run a psychophysics experiment, but hopefully this simple example has helped show how MAD Competition can be used!

MAD Competition Usage

Maximum differentiation (MAD) competition comes from a paper published in 2008 by Zhou Wang and Eero Simoncelli (reprint from LCV website). In MAD Competition, the goal is to efficiently compare two competing perceptual metrics. Like the inputs for all synthesis methods in plenoptic, metrics operate on images and produce predictions related to perception. As originally conceived, the metrics in MAD competition are either similarity (e.g., SSIM) or distance (e.g., MSE) metrics: they take two images and return a scalar value that gives a perceptual similarity or distance. For distance metrics, the smaller this number is, the more perceptually similar the metric predicts they will be; for similarity metrics, the larger the number is, the more percpetually similar.

In plenoptic, a single instantiation of MADCompetition synthesizes a single image, holding the fixed_metric constant while either maximizing or minimizing synthesis_metric, depending on the value of synthesis_target. A full set of MAD competition images consists of four images, maximizing and minimizing each of the two metrics. For each pair of images, one metric predicts they are perceptually identical, while the other metric predicts they are as dissimilar as possible. This set therefore allows us to efficiently compare the two models.

In the paper, these images are generated by manually computing the gradients, projecting one gradient out of the other, and repeating until convergence was reached. This doesn’t work as well in the general case, so we instead optimize using the following objective function:

\[t L_1(x, \hat{x}) + \lambda_1 [L_2(x, x+\epsilon) - L_2(x, \hat{x})]^2 + \lambda_2 \mathcal{B}(\hat{x})\]

where \(t\) is 1 if mad.synthesis_target is 'min' and -1 if it’s 'max', \(L_1\) is mad.synthesis_metric, \(L_2\) is mad.fixed_metric, \(x\) is mad.reference_signal, \(\hat{x}\) is mad.synthesized_signal, \(\epsilon\) is the initial noise, \(\mathcal{B}\) is the quadratic bound penalty, \(\lambda_1\) is mad.metric_tradeoff_lambda and \(\lambda_2\) is mad.range_penalty_lambda.

That’s the general idea, now let’s explore how to use the MADCompetition class for generating these images

[1]:
import plenoptic as po
import imageio
import torch
import pyrtools as pt
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
import numpy as np
import warnings

%load_ext autoreload
%autoreload 2
/home/billbrod/miniconda3/envs/plenoptic/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Basic usage

As with all our synthesis methods, we start by grabbing a target image and initalizing our models.

[2]:
img = po.data.curie()
po.imshow(img)
/home/billbrod/Documents/plenoptic/plenoptic/tools/data.py:126: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
  images = torch.tensor(images, dtype=torch.float32)
_images/tutorials_intro_08_MAD_Competition_3_1.png

To start, we’ll demonstrate MAD competition as described in the paper, using two metrics: SSIM (structural similarity index, described here) and MSE (mean-squared error), implementations for both of which are found in plenoptic.metric. We use the weighted version of SSIM described in the MAD Competition paper, hence the keyword argumentpassed to ssim below. Note also that we use 1-SSIM: SSIM measures similarity (so that 0 means completely different and 1 means identical), but MADCompetition expects metrics, which return 0 if and only if the two inputs are identical.

[3]:
model1 = lambda *args: 1-po.metric.ssim(*args, weighted=True, pad='reflect')
model2 = po.metric.mse

To intitialize the method, we only need to specify the target image, the two metrics, and the target. To start, we will hold MSE constant, while minimizing SSIM.

Note that, as described in the first block, we synthesize these images by optimizing a tradeoff between the loss of these two metrics, weighted by the metric_tradeoff_lambda. If that argument is unset, we default to something we think is reasonable, but in practice, we often need to experiment and find the appropriate value, trying out different values until the fixed metric stays constant while the synthesis metric decreases or increases as desired.

[4]:
mad = po.synth.MADCompetition(img, optimized_metric=model1, reference_metric=model2, minmax='min', initial_noise=.04,
                              metric_tradeoff_lambda=10000)
/home/billbrod/miniconda3/envs/plenoptic/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

At the most basic, all we need to do is call mad.synthesize(). Let’s do that and then view the outcome. There are several additional arguments to synthesize() but none are required.

[5]:
with warnings.catch_warnings():
    # we suppress the warning telling us that our image falls outside of the (0, 1) range,
    # which will happen briefly during synthesis.
    warnings.simplefilter('ignore')
    mad.synthesize(max_iter=200)
fig = po.synth.mad_competition.plot_synthesis_status(mad)
 95%|████████████████████████████████████████████▋  | 190/200 [00:19<00:01,  9.61it/s, loss=1.5136e-03, learning_rate=0.01, gradient_norm=7.4030e-05, pixel_change_norm=2.9363e-02, reference_metric=1.4757e-03, optimized_metric=1.4892e-03]
_images/tutorials_intro_08_MAD_Competition_9_1.png

We can see from the loss plot that SSIM’s loss has decreased, while MSE’s, other than a brief dip in the beginning, is staying roughly constant.

As described in the opening paragraph, a full set of MAD competition synthesized images consists of four images. In order to create the other images, we must create a new instance of MADCompetition. Let’s do that for the other images now:

[6]:
mad_ssim_max = po.synth.MADCompetition(img, optimized_metric=model1, reference_metric=model2, minmax='max', initial_noise=.04,
                                      metric_tradeoff_lambda=1e6)
with warnings.catch_warnings():
    # we suppress the warning telling us that our image falls outside of the (0, 1) range,
    # which will happen briefly during synthesis.
    warnings.simplefilter('ignore')
    mad_ssim_max.synthesize(max_iter=200)
fig = po.synth.mad_competition.plot_synthesis_status(mad_ssim_max)
100%|██████████████████████████████████████████████| 200/200 [00:20<00:00,  9.83it/s, loss=-3.4307e-01, learning_rate=0.01, gradient_norm=1.9942e-03, pixel_change_norm=6.0971e-02, reference_metric=1.6174e-03, optimized_metric=3.5030e-01]
_images/tutorials_intro_08_MAD_Competition_11_1.png

We’re making progress, but it doesn’t look like SSIM has quite saturated. Let’s see if we can make more progress!

To continue synthesis, we can simply call mad.synthesize() again (optimizer and scheduler will both need to be None, the default, so we reuse the ones from the initial call), and we then pick up right where we left off.

[7]:
with warnings.catch_warnings():
    # we suppress the warning telling us that our image falls outside of the (0, 1) range,
    # which will happen briefly during synthesis.
    warnings.simplefilter('ignore')
    mad_ssim_max.synthesize(max_iter=300)
fig = po.synth.mad_competition.plot_synthesis_status(mad_ssim_max)
100%|██████████████████████████████████████████████| 300/300 [00:37<00:00,  8.07it/s, loss=-3.4890e-01, learning_rate=0.01, gradient_norm=3.4509e-04, pixel_change_norm=1.5451e-02, reference_metric=1.6184e-03, optimized_metric=3.5621e-01]
_images/tutorials_intro_08_MAD_Competition_13_1.png

Next, let’s hold SSIM constant while changing MSE. This will require changing the metric_tradeoff_lambda. We also set stop_criterion explicitly, to a smaller value, to allow the synthesis to continue longer.

[8]:
mad_mse_min = po.synth.MADCompetition(img, optimized_metric=model2, reference_metric=model1, minmax='min', initial_noise=.04,
                                      metric_tradeoff_lambda=1)
with warnings.catch_warnings():
    # we suppress the warning telling us that our image falls outside of the (0, 1) range,
    # which will happen briefly during synthesis.
    warnings.simplefilter('ignore')
    mad_mse_min.synthesize(max_iter=400, stop_criterion=1e-6)
fig = po.synth.mad_competition.plot_synthesis_status(mad_mse_min)
100%|███████████████████████████████████████████████| 400/400 [01:03<00:00,  6.31it/s, loss=9.9894e-04, learning_rate=0.01, gradient_norm=7.8280e-05, pixel_change_norm=4.3131e-02, reference_metric=2.3662e-01, optimized_metric=9.9130e-04]
_images/tutorials_intro_08_MAD_Competition_15_1.png

Maximizing MSE has the same issue; after playing around with it, we use a slightly larger metric_tradeoff_lambda than above.

In general, finding an appropriate hyperparameter here will require some consideration on the part of the user and some testing of different values.

[9]:
mad_mse_max = po.synth.MADCompetition(img, optimized_metric=model2, reference_metric=model1, minmax='max', initial_noise=.04,
                                      metric_tradeoff_lambda=10)
with warnings.catch_warnings():
    # we suppress the warning telling us that our image falls outside of the (0, 1) range,
    # which will happen briefly during synthesis.
    warnings.simplefilter('ignore')
    mad_mse_max.synthesize(max_iter=200, stop_criterion=1e-6)
fig = po.synth.mad_competition.plot_synthesis_status(mad_mse_max)
100%|██████████████████████████████████████████████| 200/200 [00:41<00:00,  4.86it/s, loss=-5.7400e-03, learning_rate=0.01, gradient_norm=4.9788e-03, pixel_change_norm=2.5293e-01, reference_metric=2.4108e-01, optimized_metric=5.8745e-03]
_images/tutorials_intro_08_MAD_Competition_17_1.png

The image above has increased the local contrast in different parts of the image, which SSIM generally doesn’t care about but MSE does. For example, the collar, which in the original image is two different shades of gray, here is black and white. Similarly with the eyes, hair, and lips.

While above we displayed the synthesized image and the loss together, these are actually handled by two helper functions and can be called separately, as axes-level figures. They have additional arguments that may be worth playing around with:

[18]:
fig, axes = plt.subplots(1, 2, figsize=(15, 5), gridspec_kw={'width_ratios': [1, 2]})
po.synth.mad_competition.display_mad_image(mad, ax=axes[0], zoom=.5)
po.synth.mad_competition.plot_loss(mad, axes=axes[1], iteration=-100)
[18]:
<AxesSubplot: xlabel='Synthesis iteration', ylabel='Optimized metric loss'>
_images/tutorials_intro_08_MAD_Competition_19_1.png

We also provide helper functions to plot a full set of MAD images together, either displaying all their synthesized images or their losses (note that we’re calling our metric SDSIM because it’s now the structural dis-similarity):

[11]:
po.synth.mad_competition.display_mad_image_all(mad, mad_mse_min, mad_ssim_max, mad_mse_max, 'SDSIM');
_images/tutorials_intro_08_MAD_Competition_21_0.png

The top row shows the reference and initial images, our picture of Marie Curie and that same image plus some normally-distributed noise. The next row of images has the same MSE as the right image in the top row (when compared against the reference image), but different SDSIM values. The left image has the lowest SDSIM and is thus considered the best image, while the right image has the highest SDSIM and is thus considered the worst. The next row of images has the same SDSIM as the right image in the top, but different MSE values. The left has the lowest MSE and is thus considered the best, while the right has highest MSE and is thus considered the worst.

So MSE considers the first three images to be approximately equivalent in quality, while SDSIM considers the first image and the last two to be equivalent.

From the following plot, we can see that we generally manage to hold the fixed metric constant (dashed line for SDSIM in the right plot, solid line for MSE in the left) while increasing the target metric.

[12]:
po.synth.mad_competition.plot_loss_all(mad, mad_mse_min, mad_ssim_max, mad_mse_max, 'SDSIM');
_images/tutorials_intro_08_MAD_Competition_23_0.png

Steerable Pyramid

This tutorial walks through the basic features of the torch implementation of the Steerable Pyramid included in plenoptic, and as such describes some basic signal processing that may be useful when building models that process images. We use the steerable pyramid construction in the frequency domain, which provides perfect reconstruction (as long as the input has an even height and width, i.e., 256x256 not 255x255) and allows for any number of orientation bands. For more details on steerable pyramids and how they are built, see the pyrtools tutorial at: https://pyrtools.readthedocs.io/en/latest/.

Here we will specifically focus on the specifics of the torch version and how it may be used in concert with other differentiable torch models.

[1]:
import numpy as np
import torch
# this notebook uses torchvision, which is an optional dependency.
# if this fails, install torchvision in your plenoptic environment
# and restart the notebook kernel.
try:
    import torchvision
except ModuleNotFoundError:
    raise ModuleNotFoundError("optional dependency torchvision not found!"
                              " please install it in your plenoptic environment "
                              "and restart the notebook kernel")
import torchvision.transforms as transforms
import torch.nn.functional as F
from torch import nn
import matplotlib.pyplot as plt

import pyrtools as pt
import plenoptic as po
%matplotlib inline
from plenoptic.simulate import SteerablePyramidFreq
from plenoptic.synthesize import Eigendistortion
from plenoptic.tools.data import to_numpy
dtype = torch.float32
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
import os
from tqdm.auto import tqdm
%load_ext autoreload

%autoreload 2

Introduction: Steerable Pyramid Wavelets

In this section we will:

  1. Visualize the wavelets that produce steerable pyramid coefficients

  2. Visualize the steerable pyramid decomposition for two example images

  3. Provide some technical details about this steerable pyramid implementation and how it may differ from others

Visualizing Wavelets

Many modern computer vision algorithms employ convolution: a kernel slides across an image, taking inner products along the way. Thus the output of a convolution at a given location is the similarity between the kernel and the image content at that location, and therefore if your kernel has a low-spatial frequency it will function as a low-pass filter. Of course, low-pass filtering has a very simple interpretation in the fourier domain (simply attenuate the high spatial frequencies and leave low spatial frequencies unchanged). This particular scenario belies a more general fact: convolution in the spatial domain is mathematically equivalent to multiplication in the Fourier domain (a result known as the convolution theorem). Though both are equivalent there may be advantages to carrying out the operation in one domain or the other. This implementation of the steerable pyramid operates in the Fourier domain. For those that are interested, the first benefit of this implementation is that we have access to a perfectly invertible representation (i.e. the representation can be inverted to reconstruct the input perfectly and therefore, one can analyze how perturbations to the representation are visualized in the input space). Second, working in the Fourier domain, allows us to work with a complex-valued representation which can provide natural benefits like being able to construct quadrature pair filters etc. easily.

Because of this we don’t have direct access to a set of spatial filters for visualization, however we can generate the equivalent spatial filters by inverting a set of coefficients (i.e., output of the pyramid) constructed to contain zero everywhere except in the center of a single band. Below we do this to visualize the filters for a 3 scale 4 orientation steerable pyramid.

[2]:
order = 3
imsize = 64
pyr = SteerablePyramidFreq(height=3, image_shape=[imsize, imsize], order=order).to(device)
empty_image = torch.zeros((1, 1, imsize, imsize), dtype=dtype).to(device)
pyr_coeffs = pyr.forward(empty_image)

# insert a 1 in the center of each coefficient...
for k,v in pyr.pyr_size.items():
    mid = (v[0]//2, v[1]//2)
    pyr_coeffs[k][0, 0, mid[0], mid[1]] = 1

# ... and then reconstruct this dummy image to visualize the filter.
reconList = []
for k in pyr_coeffs.keys():
    # we ignore the residual_highpass and residual_lowpass, since we're focusing on the filters here
    if isinstance(k, tuple):
        reconList.append(pyr.recon_pyr(pyr_coeffs, [k[0]], [k[1]]))

po.imshow(reconList, col_wrap=order+1, vrange='indep1', zoom=2);
_images/tutorials_models_03_Steerable_Pyramid_4_0.png

We can see that this pyramid is representing a 3 scale 4 orientation decomposition: each row represents a single scale, which we can see because each filter is the same size. As the filter increases in size, we describe the scale as getting “coarser”, and its spatial frequency selectivity moves to lower and lower frequencies (conversely, smaller filters are operating at finer scales, with selectivity to higher spatial frequencies). In a given column, all filters have the same orientation: the first column is vertical, the third horizontal, and the other two diagonals.

Visualizing Filter Responses (Wavelet Coefficients)

Now let’s see what the steerable pyramid representation for images look like.

Like all models included in and compatible with plenoptic, the included steerable pyramid operates on 4-dimensional tensors of shape (batch, channel, height, width). We are able to perform batch computations with the steerable pyramid implementation, analyzing each batch separately. Similarly, the pyramid is meant to operate on gray-scale images, and so channel > 1 will cause the pyramid to run independently on each channel (meaning each of the first two dimesions are treated effectively as batch dimensions).

[3]:
im_batch = torch.cat([po.data.curie(), po.data.reptile_skin()], axis=0)
print(im_batch.shape)
po.imshow(im_batch)
order = 3
dim_im = 256
pyr = SteerablePyramidFreq(height=4, image_shape=[dim_im, dim_im], order=order).to(device)
pyr_coeffs = pyr(im_batch)
torch.Size([2, 1, 256, 256])
_images/tutorials_models_03_Steerable_Pyramid_7_1.png

By default, the output of the pyramid is stored as a dictionary whose keys are either a string for the 'residual_lowpass' and 'residual_highpass' bands or a tuple of (scale_index, orientation_index). plenoptic provides a convenience function, pyrshow, to visualize the pyramid’s coefficients for each image and channel.

[4]:
print(pyr_coeffs.keys())
po.pyrshow(pyr_coeffs, zoom=0.5, batch_idx=0);
po.pyrshow(pyr_coeffs, zoom=0.5, batch_idx=1);
odict_keys(['residual_highpass', (0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2), (2, 3), (3, 0), (3, 1), (3, 2), (3, 3), 'residual_lowpass'])
_images/tutorials_models_03_Steerable_Pyramid_9_1.png
_images/tutorials_models_03_Steerable_Pyramid_9_2.png

For some applications, such as coarse-to-fine optimization procedures, it may be convenient to output a subset of the representation, including coefficients from only some scales. We do this by passing a scales argument to the forward method (a list containing a subset of the values found in pyr.scales):

[5]:
#get the 3rd scale
print(pyr.scales)
pyr_coeffs_scale0 = pyr(im_batch, scales=[2])
po.pyrshow(pyr_coeffs_scale0, zoom=2, batch_idx=0);
po.pyrshow(pyr_coeffs_scale0, zoom=2, batch_idx=1);
['residual_lowpass', 3, 2, 1, 0, 'residual_highpass']
_images/tutorials_models_03_Steerable_Pyramid_11_1.png
_images/tutorials_models_03_Steerable_Pyramid_11_2.png

The above pyramid was the real pyramid but in many applications we might want the full complex pyramid output ). This can be set using the is_complex argument. When this is true, the pyramid uses a complex-valued filter, resulting in a complex-valued output. The real and imaginary components can be understood as the outputs of filters with identical scale and orientation, but different phases: the imaginary component is phase-shifted 90 degrees from the real (we refer to this matched pair of filters as a “quadrature pair”). This can be useful if you wish to construct a representation that is phase-insensitive (as complex cells in the primary visual cortex are believed to be), which can be done by computing the amplitude / complex modulus (e.g., call torch.abs(x)). po.simul.rectangular_to_polar and po.simul.rectangular_to_polar_dict provide convenience wrappers for this functionality.

See A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients for more technical details.

[6]:
order = 3
height = 3

pyr_complex = SteerablePyramidFreq(height=height, image_shape=[256,256], order=order, is_complex=True)
pyr_complex.to(device)
pyr_coeffs_complex = pyr_complex(im_batch)
[7]:
# the same visualization machinery works for complex pyramids; what is shown is the magnitude of the coefficients
po.pyrshow(pyr_coeffs_complex, zoom=0.5, batch_idx=0);
po.pyrshow(pyr_coeffs_complex, zoom=0.5, batch_idx=1);
_images/tutorials_models_03_Steerable_Pyramid_14_0.png
_images/tutorials_models_03_Steerable_Pyramid_14_1.png

Now that we have seen the basics of using the pyramid, it’s worth noting the following: an important property of the steerable pyramid is that it should respect the generalized parseval theorem (i.e. the energy of the pyramid coefficients should equal the energy of the original image). The matlabpyrtools and pyrtools versions of the SteerablePyramid DO NOT respect this, so in our version, we have provided a fix that normalizes the FFTs such that energy is preserved. This is set using the tight_frame=True when instantiating the pyramid; however, if you require matching the outputs to the matlabPyrTools or PyrTools versions, please note that you will need to set this argument to False.

Putting the “Steer” in Steerable Pyramid

As we have seen, steerable pyramids decompose images into a fixed set of orientation bands (at several spatial scales). However given the responses at this fixed set of orientation bands, the pyramid coefficents for any arbitrary intermediate orientation can be calculated from a linear interpolation of the original bands. This property is known as “steerability.” Below we steer a set of coefficients through a series of angles and visualize how the represeted features rotate.

[8]:
# note that steering is currently only implemeted for real pyramids, so the `is_complex` argument must be False (as it is by default)
pyr = SteerablePyramidFreq(height=3, image_shape=[256,256], order=3, twidth=1).to(device)
coeffs = pyr(im_batch)

# play around with different scales! Coarser scales tend to make the steering a bit more obvious.
target_scale = 2
N_steer = 64
M = torch.zeros(1, 1, N_steer, 256//2**target_scale, 256//2**target_scale)
for i, steering_offset in enumerate(np.linspace(0, 1, N_steer)):
    steer_angle = steering_offset * 2 * np.pi
    steered_coeffs, steering_weights = pyr.steer_coeffs(coeffs, [steer_angle]) # (the steering coefficients are also returned by pyr.steer_coeffs steered_coeffs_ij = oig_coeffs_ij @ steering_weights)
    M[0, 0, i] = steered_coeffs[(target_scale, 4)][0, 0] # we are always looking at the same band, but the steering angle changes

po.tools.convert_anim_to_html(po.animshow(M, framerate=6, repeat=True, zoom=2**target_scale))
[8]:

Example Application: Frontend for Convolutional Neural Network

Until now we have just seen how to use the Steerable Pyramid as a stand-alone fixed feature extractor, but what if we wanted to use it in a larger model, as a front-end for a deep CNN or other model? The steerable pyramid decomposition is qualitatively similar to the computations in primary visual cortex, so it stands to reason that using a steerable pyramid frontend might serve as an inductive bias that encourages subsequent layers to have more biological structure. In the literature it has been demonstrated that using a V1-like front end attached to a CNN trained on classification can lead to improvements in adversarial robustness Dapello et. al, 2020.

In this section we will demonstrate how the plenoptic steerable pyramid can be made compatible with standard deep learning architectures and use it as a frontend for a standard CNN.

Preliminaries

Most standard model architectures only accept channels with fixed shape, but each scale of the pyramid coefficients has a different shape (because each scale is downsampled by a factor of 2). In order to obtain an output amenable to downstream processing by standard torch nn modules, we have created an argument to the pyramid (downsample=False) that does not downsample the frequency masks at each scale and thus maintains output feature maps that all have a fixed size. Once you have done this, you can then convert the dictionary into a tensor of size (batch, channel, height, width) so that it can easily be passed to a downstream nn.Module. The details of how to do this are provided in the the convert_pyr_to_tensor function within the SteerablePyramidFreq class. Let’s try this and look at the first image both in the downsampled and not downsampled versions:

[9]:
height = 3
order = 3
pyr_fixed  = SteerablePyramidFreq(height=height, image_shape=[256,256], order=order, is_complex=True,
                                    downsample=False, tight_frame=True).to(device)
pyr_coeffs_fixed, pyr_info = pyr_fixed.convert_pyr_to_tensor(pyr_fixed(im_batch), split_complex=False)
 # we can also split the complex coefficients into real and imaginary parts as separate channels.
pyr_coeffs_split, _ = pyr_fixed.convert_pyr_to_tensor(pyr_fixed(im_batch), split_complex=True)
print(pyr_coeffs_split.shape, pyr_coeffs_split.dtype)
print(pyr_coeffs_fixed.shape, pyr_coeffs_fixed.dtype)
torch.Size([2, 26, 256, 256]) torch.float32
torch.Size([2, 14, 256, 256]) torch.complex64

We can see that in this complex pyramid with 4 scales and 3 orientations there will be 26 channels: 4 scales x 3 orientations x 2 (for real and imaginary featuremaps) + 2 (for the residual bands). NOTE: you can change what scales/residuals get included in this output tensor again using the scales argument to the forward method.

In order to display the coefficients, we need to convert the tensor coefficients back to a dictionary. We can do this either by directly accessing the dictionary version (through the pyr_coeffs attribute in the pyramid object) or by using the internal convert_tensor_to_pyr function. We can check that these are equal.

[10]:
pyr_coeffs_fixed_1 = pyr_fixed(im_batch)
pyr_coeffs_fixed_2 = pyr_fixed.convert_tensor_to_pyr(pyr_coeffs_fixed, *pyr_info)
for k in pyr_coeffs_fixed_1.keys():
    print(torch.allclose(pyr_coeffs_fixed_2[k], pyr_coeffs_fixed_1[k]))
True
True
True
True
True
True
True
True
True
True
True
True
True
True
/home/billbrod/Documents/plenoptic/src/plenoptic/simulate/canonical_computations/steerable_pyramid_freq.py:476: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at ../aten/src/ATen/native/Copy.cpp:299.)
  band = pyr_tensor[:,i,...].unsqueeze(1).type(torch.float)

We can now plot the coefficients for the not downsampled version (pyr_coeffs_complex from the last section) and the downsampled version pyr_coeffs_fixed_1 from above and see how they compare visually.

[11]:
po.pyrshow(pyr_coeffs_complex, zoom=0.5);
po.pyrshow(pyr_coeffs_fixed_1, zoom=0.5);
_images/tutorials_models_03_Steerable_Pyramid_24_0.png
_images/tutorials_models_03_Steerable_Pyramid_24_1.png

We can see that the not downsampled version maintains the same features as the original pyramid, but with fixed feature maps that have spatial dimensions equal to the original image (256x256). However, the pixel magnitudes in the bands are different due to the fact that we are not downsampling in the frequency domain anymore. This can equivalently be thought of as the inverse operation of blurring and downsampling. Therefore the upsampled versions of each scale are not simply zero interpolated versions of the downsampled versions and thus the pixel values are non-trivially changed. However, the energy in each band should be preserved between the two pyramids and we can check this by computing the energy in each band for the two pyramids and checking if they are the same.

[12]:
# the following passes with tight_frame=True or tight_frame=False, either way.
pyr_not_downsample  = SteerablePyramidFreq(height=height,image_shape=[256,256],order=order,is_complex = False,twidth=1, downsample=False, tight_frame=False)
pyr_not_downsample.to(device)

pyr_downsample  = SteerablePyramidFreq(height=height,image_shape=[256,256],order=order,is_complex = False,twidth=1, downsample=True, tight_frame=False)
pyr_downsample.to(device)
pyr_coeffs_downsample = pyr_downsample(im_batch.to(device))
pyr_coeffs_not_downsample = pyr_not_downsample(im_batch.to(device))
for i in range(len(pyr_coeffs_downsample.keys())):
    k = list(pyr_coeffs_downsample.keys())[i]
    v1 = to_numpy(pyr_coeffs_downsample[k])
    v2 = to_numpy(pyr_coeffs_not_downsample[k])
    v1 = v1.squeeze()
    v2 = v2.squeeze()
    #check if energies match in each band between downsampled and fixed size pyramid responses
    print(np.allclose(np.sum(np.abs(v1)**2), np.sum(np.abs(v2)**2), rtol=1e-4, atol=1e-4))

def check_parseval(im ,coeff, rtol=1e-4, atol=0):
    '''
    function that checks if the pyramid is parseval, i.e. energy of coeffs is
    the same as the energy in the original image.
    Args:
    input image: image stimulus as torch.Tensor
    coeff: dictionary of torch tensors corresponding to each band
    '''
    total_band_energy = 0
    im_energy = im.abs().square().sum().numpy()
    for k,v in coeff.items():
        band = coeff[k]
        print(band.abs().square().sum().numpy())
        total_band_energy += band.abs().square().sum().numpy()

    np.testing.assert_allclose(total_band_energy, im_energy, rtol=rtol, atol=atol)
True
True
True
True
True
True
True
True
True
True
True
True
True
True

Model Training

We are now ready to demonstrate how the steerable pyramid can be used as a fixed frontend for further stages of (learnable) processing!

[13]:
# First we define/download the dataset
train_set = torchvision.datasets.FashionMNIST(
    # change this line to wherever you'd like to download the FashionMNIST dataset
    root = '../data',
    train = True,
    download = True,
    transform = transforms.Compose([
        transforms.ToTensor()
    ])
)
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/train-images-idx3-ubyte.gz
100%|█████████████████████████████████████████████████████████████████| 26421880/26421880 [00:25<00:00, 1021172.29it/s]
Extracting /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/train-images-idx3-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/train-labels-idx1-ubyte.gz
100%|████████████████████████████████████████████████████████████████████████| 29515/29515 [00:00<00:00, 316479.83it/s]
Extracting /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/train-labels-idx1-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/t10k-images-idx3-ubyte.gz
100%|███████████████████████████████████████████████████████████████████| 4422102/4422102 [00:01<00:00, 3325433.77it/s]
Extracting /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
100%|█████████████████████████████████████████████████████████████████████████| 5148/5148 [00:00<00:00, 6730759.66it/s]
Extracting /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw


[14]:
# Define a simple model: SteerPyr --> ConvLayer --> Fully Connected
class PyrConvFull(nn.Module):
    def __init__(self, imshape, order, scales, exclude=[], is_complex=True):
        super().__init__()

        self.imshape = imshape
        self.order = order
        self.scales = scales
        self.output_dim = 20 # number of channels in the convolutional block
        self.kernel_size = 6
        self.is_complex = is_complex

        self.rect = nn.ReLU()
        self.pyr = SteerablePyramidFreq(height=self.scales,image_shape=self.imshape,
                                          order=self.order,is_complex = self.is_complex,twidth=1, downsample=False)

        # num_channels = num_scales * num_orientations (+ 2  residual bands) (* 2 if complex)
        channels_per = 2 if self.is_complex else 1
        self.pyr_channels = ((self.order + 1) * self.scales + 2) * channels_per

        self.conv = nn.Conv2d(in_channels=self.pyr_channels, kernel_size=self.kernel_size,
                              out_channels=self.output_dim, stride=2)
        # the input ndim here has to do with the dimensionality of self.conv's output, so will have to change
        # if kernel_size or output_dim do
        self.fc = nn.Linear(self.output_dim * 12**2, 10)

    def forward(self, x):
        out = self.pyr(x)
        out, _ = self.pyr.convert_pyr_to_tensor(out)

        # case handling for real v. complex forward passes
        if self.is_complex:
            # split to real and imaginary so nonlinearities make sense
            out_re = self.rect(out.imag)
            out_im = self.rect(out.real)

            # concatenate
            out = torch.cat([out_re, out_im], dim=1)
        else:
            out = self.rect(out)


        out = self.conv(out)
        out = self.rect(out)
        out = out.view(out.shape[0], -1) # reshape for linear layer
        out = self.fc(out)

        return out
[15]:
# Training Pyramid Model
model_pyr = PyrConvFull([28, 28], order=4, scales=2, is_complex=False)
loader = torch.utils.data.DataLoader(train_set, batch_size = 50)
optimizer = torch.optim.Adam(model_pyr.parameters(), lr=1e-3)


epoch = 2
losses = []
fracts_correct = []
for e in range(epoch):
    for batch in tqdm(loader):
        images = batch[0]
        labels = batch[1]
        preds = model_pyr(images)
        loss = F.cross_entropy(preds, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        losses.append(loss.item())

        n_correct = preds.argmax(dim=1).eq(labels).sum().item()
        fracts_correct.append(n_correct / 50)

fig, axs = plt.subplots(1, 2, figsize=(10, 5))
axs[0].plot(losses)
axs[0].set_xlabel('Iteration')
axs[0].set_ylabel('Cross Entropy Loss')
axs[1].plot(fracts_correct)
axs[1].set_xlabel('Iteration')
axs[1].set_ylabel('Classification Performance')
[15]:
Text(0, 0.5, 'Classification Performance')
_images/tutorials_models_03_Steerable_Pyramid_30_3.png

The steerable pyramid can be smoothly integrated with standard torch modules and autograd, so the impact of including such a frontend could be probed using the sythesis techniques provided by Plenoptic.

Perceptual distance

Run notebook online with Binder: Binder

The easiest way to measure the difference between two images is by computing the mean square error (MSE), but it does not match the perceptual distance judged by humans. Several perceptual distance functions have been developed to better match human perception. This tutorial introduces three perceptual distance functions available in plenoptic package: SSIM (structural similarity), MS-SSIM (multiscale structural similarity) and NLPD (normalized Laplacian pyramid distance).

References

SSIM: Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612.

MS-SSIM: Wang, Z., Simoncelli, E. P., & Bovik, A. C. (2003, November). Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003 (Vol. 2, pp. 1398-1402). IEEE.

NLPD: Laparra, V., Ballé, J., Berardino, A., & Simoncelli, E. P. (2016). Perceptual image quality assessment using a normalized Laplacian pyramid. Electronic Imaging, 2016(16), 1-6.

[1]:
import os
import io
import imageio
import plenoptic as po
import numpy as np
from scipy.stats import pearsonr, spearmanr
import matplotlib.pyplot as plt
import torch
from PIL import Image
SSIM (structural similarity)

The idea of SSIM index is to decompose the difference between two images into three components: luminance, contrast and structure. For two small image patches \(\mathbf{x}\) and \(\mathbf{y}\), these three components of difference are defined as:

\[ l(\mathbf{x}, \mathbf{y}) = \frac{2 \mu_x \mu_y + C_1}{\mu_x^2 + \mu_y^2 + C_1}, \qquad c(\mathbf{x}, \mathbf{y}) = \frac{2 \sigma_x \sigma_y + C_2}{\sigma_x^2 + \sigma_y^2 + C_2}, \qquad s(\mathbf{x}, \mathbf{y}) = \frac{\sigma_{xy} + C_3}{\sigma_x \sigma_y + C_3}\]

where \(\mu_x\) and \(\mu_y\) are the mean of \(\mathbf{x}\) and \(\mathbf{y}\), \(\sigma_x\) and \(\sigma_y\) are the standard deviation of \(\mathbf{x}\) and \(\mathbf{y}\), and \(\sigma_{xy}\) is the covariance between \(\mathbf{x}\) and \(\mathbf{y}\). And \(C_1, C_2, C_3\) are small constants. If we ignore the small constants, we can see that the luminance term \(l(\mathbf{x}, \mathbf{y})\) is a scale-invariant similarity measurement between \(\mu_x\) and \(\mu_y\), and the contrast term \(c(\mathbf{x}, \mathbf{y})\) is such a measurement between \(\sigma_x\) and \(\sigma_y\). The structural term \(s(\mathbf{x}, \mathbf{y})\) is the correlation coefficient between \(\mathbf{x}\) and \(\mathbf{y}\), which is invariant to addition and multiplication of constants on \(\mathbf{x}\) or \(\mathbf{y}\).

Local SSIM between two small image patches \(\mathbf{x}\) and \(\mathbf{y}\) is defined as (let \(C_3 = C_2 / 2\)):

\[d(\mathbf{x}, \mathbf{y}) = l(\mathbf{x}, \mathbf{y}) c(\mathbf{x}, \mathbf{y}) s(\mathbf{x}, \mathbf{y}) = \frac{\left( 2 \mu_x \mu_y + C_1 \right) \left( 2\sigma_{xy} + C_2 \right)} {\left( \mu_x^2 + \mu_y^2 + C_1 \right) \left( \sigma_x^2 + \sigma_y^2 + C_2 \right)}\]

The local SSIM value \(d(\mathbf{x}, \mathbf{y}) = 1\) means the two patches are identical and \(d(\mathbf{x}, \mathbf{y}) = 0\) means they’re very different. When the two patches are negatively correlated, \(d(\mathbf{x}, \mathbf{y})\) can be negative. The local SSIM value is bounded between -1 and 1.

For two full images \(\mathbf{X}, \mathbf{Y}\), an SSIM map is obtained by computing the local SSIM value \(d\) across the whole image. For each position on the images, instead of using a square patch centered on it, a circular-symmeric Gaussian kernel is used to compute the local mean, standard deviation and covariance terms \(\mu_{X,i}, \mu_{Y,i}, \sigma_{X,i}, \sigma_{Y,i}, \sigma_{XY,i}\), where \(i\) is the pixel index. In this way we can obtain an SSIM map \(d_i(\mathbf{X}, \mathbf{Y})\). The values in the SSIM map are averaged to generate a single number, which is the SSIM index:

\[\text{SSIM}(\mathbf{X}, \mathbf{Y}) = \frac{1}{N} \sum_{i=1}^N d_i(\mathbf{X}, \mathbf{Y}) = \frac{1}{N} \sum_{i=1}^N l_i(\mathbf{X}, \mathbf{Y}) c_i(\mathbf{X}, \mathbf{Y}) s_i(\mathbf{X}, \mathbf{Y}) = \frac{1}{N} \sum_{i=1}^N \frac{\left( 2 \mu_{X,i} \mu_{Y,i} + C_1 \right) \left( 2\sigma_{XY,i} + C_2 \right)} {\left( {\mu_{X,i}}^2 + {\mu_{Y,i}}^2 + C_1 \right) \left( {\sigma_{X,i}}^2 + {\sigma_{Y,i}}^2 + C_2 \right)}\]

where \(N\) is the number of pixels of the image. The SSIM index is also bounded between -1 and 1. In plenoptic, the SSIM map is computed by the function po.metric.ssim_map, and the SSIM index itself is computed by the function po.metric.ssim. For more information, see the original paper:

Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612.

Understanding SSIM

We demonstrate the effectiveness of SSIM by generating five different types of distortions (contrast stretching, mean shifting, JPEG compression, blurring, and salt-pepper noise) with the same MSE, and computing their SSIM values.

[2]:
import tempfile
def add_jpeg_artifact(img, quality):
    # need to convert this back to 2d 8-bit int for writing out as jpg
    img = po.to_numpy(img.squeeze() * 255).astype(np.uint8)
    # write to a temporary file
    with tempfile.NamedTemporaryFile(suffix='.jpg') as tmp:
        imageio.imwrite(tmp.name, img, quality=quality)
        img = po.load_images(tmp.name)
    return img

def add_saltpepper_noise(img, threshold):
    po.tools.set_seed(0)
    img_saltpepper = img.clone()
    for i in range(img.shape[-2]):
        for j in range(img.shape[-1]):
            x = np.random.rand()
            if x < threshold:
                img_saltpepper[..., i, j] = 0
            elif x > 1 - threshold:
                img_saltpepper[..., i, j] = 1
    np.random.seed(None)
    return img_saltpepper

def get_distorted_images():
    img = po.data.einstein()
    img_contrast = torch.clip(img + 0.20515 * (2 * img - 1), min=0, max=1)
    img_mean = torch.clip(img + 0.05983, min=0, max=1)
    img_jpeg = add_jpeg_artifact(img, quality=4)
    img_blur = po.simul.Gaussian(5, std=2.68)(img)
    img_saltpepper = add_saltpepper_noise(img, threshold=0.00651)
    img_distorted = torch.cat([img, img_contrast, img_mean, img_jpeg, img_blur, img_saltpepper], axis=0)
    return img_distorted
[3]:
img_distorted = get_distorted_images()
mse_values = torch.square(img_distorted - img_distorted[0]).mean(dim=(1, 2, 3))
ssim_values = po.metric.ssim(img_distorted, img_distorted[[0]])[:, 0]
names = ["Original image", "Contrast change", "Mean shift", "JPEG artifact", "Gaussian blur", "Salt-and-pepper noise"]
titles = [f"{names[i]}\nMSE={mse_values[i]:.3e}, SSIM={ssim_values[i]:.4f}" for i in range(6)]
po.imshow(img_distorted, vrange="auto", title=titles, col_wrap=3);
/home/billbrod/micromamba/envs/plenoptic/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
_images/tutorials_models_04_Perceptual_distance_5_1.png

We can see that the SSIM index matches human perception better than MSE.

While the scalar SSIM index is a concise summary, the SSIM map offers richer information about where perceptual discrepancy is located in the image. Here, we visualize the SSIM map of a JPEG compressed image, and also show the absolute error (absolute value of the difference) for comparison. In both maps, darker means more different.

[4]:
def get_demo_images():
    img = po.data.parrot(as_gray=True)
    img_jpeg = add_jpeg_artifact(img, quality=6)
    ssim_map_small = po.metric.ssim_map(img, img_jpeg)
    ssim_map = torch.ones_like(img)
    ssim_map[:, :, 5:-5, 5:-5] = ssim_map_small
    abs_map = 1 - torch.abs(img - img_jpeg)
    img_demo = torch.cat([img, img_jpeg, ssim_map, abs_map], dim=0).cpu()
    return img_demo
[5]:
img_demo = get_demo_images()
titles = ["Original", "JPEG artifact", "SSIM map", "Absolute error"]
po.imshow(img_demo, title=titles);
_images/tutorials_models_04_Perceptual_distance_8_0.png

You can judge whether the SSIM map captures the location of perceptual discrepancy better than absolute error.

MS-SSIM (multiscale structural similarity)

MS-SSIM computes SSIM on multiple scales of the images. To do this, the two images \(\mathbf{X}\) and \(\mathbf{Y}\) are recursively blurred and downsampled by a factor of 2 to produce two sequences of images: \(\mathbf{X}_1, \cdots, \mathbf{X}_M\) and \(\mathbf{Y}_1, \cdots, \mathbf{Y}_M\), where \(\mathbf{X}_1 = \mathbf{X}\), and \(\mathbf{X}_{i+1}\) is obtained by blurring and downsampling \(\mathbf{X}_{i}\) (same for \(\mathbf{Y}\)). Such a sequence is called a Gaussian pyramid. We define a contrast-structural index that does not include luminance component:

\[\text{CS}(\mathbf{X}, \mathbf{Y}) = \frac{1}{N} \sum_{i=1}^N c_i(\mathbf{X}, \mathbf{Y}) s_i(\mathbf{X}, \mathbf{Y}) = \frac{1}{N} \sum_{i=1}^N \frac{2\sigma_{XY,i} + C_2} {{\sigma_{X,i}}^2 + {\sigma_{Y,i}}^2 + C_2}\]

The MS-SSIM index is defined as:

\[\text{MS-SSIM}(\mathbf{X}, \mathbf{Y}) = \text{SSIM}(\mathbf{X}_M, \mathbf{Y}_M)^{\gamma_M} \prod_{k=1}^{M-1} \text{CS}(\mathbf{X}_i, \mathbf{Y}_i)^{\gamma_i}\]

where \(\gamma_1, \cdots, \gamma_M\) are exponents that determine the relative importance of different scales. They are determined by a human psychophysics experiment and are constrained to sum to 1. When \(M=1\), the MS-SSIM index is the same as the SSIM index. In the standard implementation of MS-SSIM, \(M = 5\). In plenoptic, the MS-SSIM index is computed by the function po.metric.ms_ssim. For more information, see the original paper:

Wang, Z., Simoncelli, E. P., & Bovik, A. C. (2003, November). Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003 (Vol. 2, pp. 1398-1402). IEEE.

Here we use the same distortions on Einstein image to demonstrate MS-SSIM:

[6]:
msssim_values = po.metric.ms_ssim(img_distorted, img_distorted[[0]])[:, 0]
names = ["Original image", "Contrast change", "Mean shift", "JPEG artifact", "Gaussian blur", "Salt-and-pepper noise"]
titles = [f"{names[i]}\nMSE={mse_values[i]:.3e}, MS-SSIM={msssim_values[i]:.3f}" for i in range(6)]
po.imshow(img_distorted, vrange="auto", title=titles, col_wrap=3);
_images/tutorials_models_04_Perceptual_distance_12_0.png
NLPD (normalized Laplacian pyramid distance)

Similar to MS-SSIM, the NLPD is also based on a multiscale representation of the images. Also similar to MS-SSIM, the idea of NLPD is also to separate out the effects of luminance and contrast difference. Unlike MS-SSIM, the NLPD directly performs luminance subtraction and contrast normalization on each scale, and then computes simple square difference. The NLPD uses the Laplacian pyramid for luminance subtraction. Given a Gaussian pyramid \(\mathbf{X}_1, \cdots, \mathbf{X}_M\), for \(k=1, \cdots, M - 1\), we upsample and blur \(\mathbf{X}_{k+1}\) to produce \(\mathbf{\hat{X}}_k\), which is a blurry version of \(\mathbf{X}_k\), and let \(\mathbf{X}'_k = \mathbf{X}_k - \mathbf{\hat{X}}_k\). Define \(\mathbf{X}'_M = \mathbf{X}_M\), and we get the Laplacian pyramid \(\mathbf{X}'_1, \cdots, \mathbf{X}'_M\). In plenoptic, the Laplacian pyramid is implemented by po.simul.LaplacianPyramid.

The contrast normalization is achieved by dividing by a local estimation of amplitude:

\[\mathbf{X}''_{k,i} = \frac{\mathbf{X}'_{k,i}} {f(\mathbf{X}'_{k,N(i)}; \sigma_k, \mathbf{p}_k)}, \qquad f(\mathbf{X}'_{k,N(i)}; \sigma_k, \mathbf{p}_k) = \sigma_k + \sum_{j\in N(i)} p_{k,j-i} |\mathbf{X}'_{k,j}|\]

where \(N(i)\) is the neighborhood of pixel \(i\) which does not include \(i\) itself, and the parameters \(\sigma_k\) and \(\mathbf{p}_k\) are learned from an image dataset:

\[\sigma_k = \mathbb{E}_{\mathbf{X},i} \left( |\mathbf{X}'_{k,i}| \right) \qquad \mathbf{p}_k = \arg\min_{\mathbf{p}_k} \mathbb{E}_{\mathbf{X},i} \left( \mathbf{X}'_{k,i} - f(\mathbf{X}'_{k,N(i)}; \sigma_k, \mathbf{p}_k) \right)^2\]

Note that this learning is performed on the clean images only, without access to the corruption type or human phychophysics data. The sequence \(\mathbf{X}''_1, \cdots, \mathbf{X}''_M\) is the normalized Laplacian pyramid. The same procudure is done for \(\mathbf{Y}\). The NLPD is defined as:

\[\text{NLPD}(\mathbf{X}, \mathbf{Y}) = \frac{1}{M} \sum_{k=1}^M \sqrt{\frac{1}{N_k} \sum_{i=1}^{N_k} (\mathbf{X}''_{k,i} - \mathbf{Y}''_{k,i})^2}\]

where \(N_k\) is the number of pixels of \(\mathbf{X}''_k\). In plenoptic, the NLPD is computed by the function po.metric.nlpd. For more information, see the original paper:

Laparra, V., Ballé, J., Berardino, A., & Simoncelli, E. P. (2016). Perceptual image quality assessment using a normalized Laplacian pyramid. Electronic Imaging, 2016(16), 1-6.

Here we use the same distortions on Einstein image to demonstrate NLPD:

[7]:
nlpd_values = po.metric.nlpd(img_distorted, img_distorted[[0]])[:, 0]
names = ["Original image", "Contrast change", "Mean shift", "JPEG artifact", "Gaussian blur", "Salt-and-pepper noise"]
titles = [f"{names[i]}\nMSE={mse_values[i]:.3e}, NLPD={nlpd_values[i]:.4f}" for i in range(6)]
po.imshow(img_distorted, vrange="auto", title=titles, col_wrap=3);
_images/tutorials_models_04_Perceptual_distance_15_0.png
Usage

The basic usage of ssim, ms_ssim and nlpd in the po.metric module is the same: they take two arguments that are images to be compared, whose shapes should be in the format (batch, channel, height, width). All these functions are designed for grayscale images, so the channel dimension is treated as another batch dimension. The height and width of the two arguments should be the same, and the batch and channel sizes of the two arguments should be broadcastable. The broadcasting is already demonstrated in the examples of SSIM, MS-SSIM and NLPD that use the Einstein image.

SSIM, MS-SSIM and NLPD are not scale-invariant. The input images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).

[8]:
# Take SSIM as an example here. The images in img_demo have a range of [0, 1].
val1 = po.metric.ssim(img_demo[[0]], img_demo[[1]])
val2 = po.metric.ssim(img_demo[[0]] * 255, img_demo[[1]] * 255)  # This produces a wrong result and triggers a warning: Image range falls outside [0, 1].
print(f"True SSIM: {float(val1):.4f}, rescaled image SSIM: {float(val2):.4f}")
True SSIM: 0.7703, rescaled image SSIM: 0.4048
/home/billbrod/Documents/plenoptic/src/plenoptic/metric/perceptual_distance.py:42: UserWarning: Image range falls outside [0, 1]. img1: tensor([ 14., 255.]), img2: tensor([  0., 255.]). Continuing anyway...
  warnings.warn("Image range falls outside [0, 1]."
Comparison of performance

The performance of these perceptual distance metrics can be measured by the correlation with human psychophysics data: the TID2013 dataset consists of 3000 different distorted images (25 clean images x 24 types of distortions x 5 levels of distortions), each with its own mean opinion score (MOS; the perceived quality of the distorted image). Higher MOS means a smaller distance from its corresponding clean image. The TID2013 dataset is described in the following paper:

Ponomarenko, N., Jin, L., Ieremeiev, O., Lukin, V., Egiazarian, K., Astola, J., … & Kuo, C. C. J. (2015). Image database TID2013: Peculiarities, results and perspectives. Signal processing: Image communication, 30, 57-77.

Since both SSIM and MS-SSIM have higher values for less different image pairs, and are maximized at 1 for identical images, we need to convert them to distances as 1-SSIM and 1-(MS-SSIM). Then we will plot MOS against the three metrics: 1-SSIM, 1-(MS-SSIM) and NLPD, as well as the baseline RMSE (square root of mean square error). We will also measure the correlation.

To execute this part of the notebook, the TID2013 dataset needs to be downloaded. In order to do so, we use an optional dependency, pooch. If the following raises an ImportError or ModuleNotFoundError then install pooch in your plenoptic environment and restart your kernel. Note that the dataset is fairly large, about 1GB.

[11]:
def get_tid2013_data():
    folder = po.data.fetch_data('tid2013.tar.gz')
    reference_images = torch.zeros([25, 1, 384, 512])
    distorted_images = torch.zeros([25, 24, 5, 1, 384, 512])
    reference_filemap = {s.lower(): s for s in os.listdir(folder / "reference_images")}
    distorted_filemap = {s.lower(): s for s in os.listdir(folder / "distorted_images")}
    for i in range(25):
        reference_filename = reference_filemap[f"i{i+1:02d}.bmp"]
        reference_images[i] = torch.tensor(np.asarray(Image.open(
            folder / "reference_images" / reference_filename).convert("L"))) / 255
        for j in range(24):
            for k in range(5):
                distorted_filename = distorted_filemap[f"i{i+1:02d}_{j+1:02d}_{k+1}.bmp"]
                distorted_images[i, j, k] = torch.tensor(np.asarray(Image.open(
                    folder / "distorted_images" / distorted_filename).convert("L"))) / 255
    distorted_images = distorted_images[:, [0] + list(range(2, 17)) + list(range(18, 24))]  # Remove color distortions

    with open(folder/ "mos.txt", "r", encoding="utf-8") as g:
        mos_values = list(map(float, g.readlines()))
    mos_values = np.array(mos_values).reshape([25, 24, 5])
    mos_values = mos_values[:, [0] + list(range(2, 17)) + list(range(18, 24))]  # Remove color distortions
    return reference_images, distorted_images, mos_values

def correlate_with_tid(func_list, name_list):
    reference_images, distorted_images, mos_values = get_tid2013_data()
    distance = torch.zeros([len(func_list), 25, 22, 5])
    for i, func in enumerate(func_list):
        for j in range(25):
            distance[i, j] = func(reference_images[[j]], distorted_images[j].flatten(0, 1)).reshape(22, 5)

    plot_size = int(np.ceil(np.sqrt(len(func_list))))
    fig, axs = plt.subplots(plot_size, plot_size, squeeze=False, figsize=(plot_size * 6, plot_size * 6))
    axs = axs.flatten()
    edgecolor_list = ["m", "c", "k", "g", "r"]
    facecolor_list = [None, "none", "none", None, "none"]
    shape_list = ["x", "s", "o", "*", "^"]
    distortion_names = ["Additive Gaussian noise",
                        "Spatially correlated noise",
                        "Masked noise",
                        "High frequency noise",
                        "Impulse noise",
                        "Quantization noise",
                        "Gaussian blur",
                        "Image denoising",
                        "JPEG compression",
                        "JPEG2000 compression",
                        "JPEG transmission errors",
                        "JPEG2000 transmission errors",
                        "Non eccentricity pattern noise",
                        "Local block-wise distortions of different intensity",
                        "Mean shift (intensity shift)",
                        "Contrast change",
                        "Multiplicative Gaussian noise",
                        "Comfort noise",
                        "Lossy compression of noisy images",
                        "Image color quantization with dither",
                        "Chromatic aberrations",
                        "Sparse sampling and reconstruction"]

    for i, name in enumerate(name_list):
        for j in range(22):
            edgecolor = edgecolor_list[j % 5]
            facecolor = facecolor_list[j // 5]
            if facecolor is None:
                facecolor = edgecolor
                edgecolor = None
            axs[i].scatter(distance[i, :, j].flatten(), mos_values[:, j].flatten(), s=20,
                           edgecolors=edgecolor, facecolors=facecolor,
                           marker=shape_list[j // 5], label=distortion_names[j])
        pearsonr_value = pearsonr(-mos_values.flatten(), distance[i].flatten())[0]
        spearmanr_value = spearmanr(-mos_values.flatten(), distance[i].flatten())[0]
        axs[i].set_title(
            f"pearson {pearsonr_value:.4f}, spearman {spearmanr_value:.4f}")
        axs[i].set_xlabel(name)
        axs[i].set_ylabel("MOS")
    lines, labels = axs[0].get_legend_handles_labels()
    fig.legend(lines, labels, loc="lower center", bbox_to_anchor=(0.5, 1.0))
    plt.tight_layout()
    plt.show()
[12]:
def rmse(img1, img2):
    return torch.sqrt(torch.square(img1 - img2).mean(dim=(-2, -1)))

def one_minus_ssim(img1, img2):
    return 1 - po.metric.ssim(img1, img2)

def one_minus_msssim(img1, img2):
    return 1 - po.metric.ms_ssim(img1, img2)

# This takes some minutes to run
correlate_with_tid(func_list=[rmse, one_minus_ssim, one_minus_msssim, po.metric.nlpd], name_list=["RMSE", "1 - SSIM", "1 - (MS-SSIM)", "NLPD"])
_images/tutorials_models_04_Perceptual_distance_21_0.png

Each point in the figures is a distorted image, and the color/shape indicates the distortion type. The goodness of the perceptual distance metrics can be qualitatively assessed by looking at how well the points follow a monotonic function, and how straight this monotonic function is. We can see that points for RMSE and 1-SSIM are more scattered than 1-(MS-SSIM) and NLPD. The points for NLPD follows a much straighter line than other methods. The points for RMSE have outliers belonging to certain distortion types, notably mean shift and contrast change, while all three perceptual distance metrics are able to handle them better.

For a quantitative comparison, we calculate the Pearson’s and Spearman’s correlation coefficient between MOS and each perceptual distance metric (shown above the figures). Pearson’s correlation measures linear relationship, while Spearman’s correlation allows a nonlinear relationship since it only depends on ranking. We can see that the performance of the metrics, as measured by the correlation coefficients, is: NLPD > MS-SSIM > SSIM > RMSE.

[1]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import torch
import plenoptic as po
import scipy.io as sio
import os
import os.path as op
import einops
import glob
import math
import pyrtools as pt
from tqdm import tqdm
from PIL import Image
%load_ext autoreload
%autoreload

# We need to download some additional images for this notebook. In order to do so,
# we use an optional dependency, pooch. If the following raises an ImportError or ModuleNotFoundError
# then install pooch in your plenoptic environment and restart your kernel.
DATA_PATH = po.data.fetch_data('portilla_simoncelli_images.tar.gz')
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72

# set seed for reproducibility
po.tools.set_seed(1)
/mnt/home/wbroderick/miniconda3/envs/plenoptic/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[2]:
# These variables control how long metamer synthesis runs for. The values present here will result in completed synthesis,
# but you may want to decrease these numbers if you're on a machine with limited resources.
short_synth_max_iter = 1000
long_synth_max_iter = 3000
longest_synth_max_iter = 4000

Portilla-Simoncelli Texture Metamer

In this tutorial we will aim to replicate Portilla & Simoncelli (1999). The tutorial is broken into the following parts:

  1. Introduce the concept of a Visual Texture.

  2. How to synthesize metamers for the Portilla & Simoncelli texture model.

  3. Demonstrate the importance of different classes of statistics.

  4. Example syntheses from different classes of textures (e.g., artificial, Julesz, pseudoperiodic, etc.)

  5. Extrapolation and Mixtures: Applying texture synthesis to more complex texture problems.

  6. Some model limitations.

  7. List of notable differences between the MATLAB and python implementations of the Portilla Simoncelli texture model and texture synthesis.

Note that this notebook takes a long time to run (roughly an hour with a GPU, several hours without), because of all the metamers that are synthesized.

1. What is a visual texture?

The simplest definition is a repeating visual pattern. Textures encompass a wide variety of images, including natural patterns such as bark or fur, artificial ones such as brick, and computer-generated ones such as the Julesz patterns (Julesz 1978, Yellot 1993). Below we load some examples.

The Portilla-Simoncelli model was developed to measure the statistical properties of visual textures. Metamer synthesis was used (and can be used) in conjunction with the Portilla-Simoncelli texture model to demonstrate the necessity of different properties of the visual texture. We will use some of these example textures to demonstrate aspects of the Portilla Simoncelli model.

[3]:
# Load and display a set of visual textures

def display_images(im_files, title=None):
    images = po.tools.load_images(im_files)
    fig = po.imshow(images, col_wrap=4, title=None)
    if title is not None:
        fig.suptitle(title, y=1.05)

natural = ['3a','6a','8a','14b','15c','15d','15e','15f','16c','16b','16a']
artificial = ['4a','4b','14a','16e','14e','14c','5a']
hand_drawn = ['5b','13a','13b','13c','13d']

im_files = [DATA_PATH / f'fig{num}.jpg' for num in natural]
display_images(im_files, "Natural textures")
_images/tutorials_models_Metamer-Portilla-Simoncelli_4_0.png
[4]:
im_files = [DATA_PATH / f'fig{num}.jpg' for num in artificial]
display_images(im_files, 'Articial textures')
_images/tutorials_models_Metamer-Portilla-Simoncelli_5_0.png
[5]:
im_files = [DATA_PATH / f'fig{num}.jpg' for num in hand_drawn]
display_images(im_files, 'Hand-drawn / computer-generated textures')
_images/tutorials_models_Metamer-Portilla-Simoncelli_6_0.png

2. How to generate Portilla-Simoncelli Metamers

2.1 A quick reminder of what metamers are and why we are calculating them.

The primary reason that the original Portilla-Simoncelli paper developed the metamer procedure was to assess whether the model’s understanding of textures matches that of humans. While developing the model, the authors originally evaluated it by performing texture classification on a then-standard dataset (i.e., “is this a piece of fur or a patch of grass?”). The model aced the test, with 100% accuracy. After an initial moment of elation, the authors decided to double-check and performed the same evaluation with a far simpler the model, which used the steerable pyramid to compute oriented energy (the first stage of the model described here). That model also classified the textures with 100% accuracy. The authors interpreted this as their evaluation being too easy, and sought a method that would allow them to determine whether their model better matched human texture perception.

In the metamer paradigm they eventually arrived at, the authors generated model metamers: images with different pixel values but (near-)identical texture model outputs. They then evaluated whether these images belonged to the same texture class: does this model metamer of a basket also look like a basket, or does it look like something else? Importantly, they were not evaluating whether the images were indistinguishable, but whether they belonged to the same texture family. This paradigm thus tests whether the model is capturing important information about how humans understand and group textures.

2.2 How do we use the plenoptic package to generate Portilla-Simoncelli Texture Metamers?

Generating a metamer starts with a target image:

[6]:
img = po.tools.load_images(DATA_PATH / 'fig4a.jpg')
po.imshow(img);
_images/tutorials_models_Metamer-Portilla-Simoncelli_8_0.png

Below we have an instance of the PortillaSimoncelli model with default parameters:

  • n_scales=4, The number of scales in the steerable pyramid underlying the model.

  • n_orientations=4, The number of orientations in the steerable pyramid.

  • spatial_corr_width=9, The size of the window used to calculate the correlations across steerable pyramid bands.

Running the model on an image will return a tensor of numbers summarizing the “texturiness” of that image, which we refer to as the model’s representation. These statistics are measurements of different properties that the authors considered relevant to a texture’s appearance (where a texture is defined above), and capture some of the repeating properties of these types of images. Section 3 of this notebook explores those statistics and how they relate to texture properties.

When the model representation of two images match, the model considers the two images identical and we say that those two images are model metamers. Synthesizing a novel image that matches the representation of some arbitrary input is the goal of the Metamer class.

[7]:
n=img.shape[-1]
model = po.simul.PortillaSimoncelli([n,n])
stats = model(img)
print(stats)
tensor([[[ 0.4350,  0.0407,  0.1622,  ..., -0.0078, -0.2282,  0.0023]]])

To use Metamer, simply initialize it with the target image and the model, then call .synthesize(). By setting store_progress=True, we update a variety of attributes (all of which start with saved_) on each iteration so we can later examine, for example, the synthesized image over time. Let’s quickly run it for just 10 iterations to see how it works.

[8]:
met = po.synth.Metamer(img, model)
met.synthesize(store_progress=True, max_iter=10)
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
  warnings.warn(
100%|██████████| 10/10 [00:01<00:00,  9.83it/s, loss=4.5063e-02, learning_rate=0.01, gradient_norm=1.6559e-02, pixel_change_norm=1.2805e+00]

We can then call the plot_synthesis_status method to see how things are doing. The image on the left shows the metamer at this moment in synthesis, while the center plot shows the loss over time, with the red dot pointing out the current loss, and the rightmost plot shows the representation error. For the texture model, we plot the difference in representations split up across the different category of statistics (which we’ll describe in more detail later).

[9]:
# representation_error plot has three subplots, so we increase its relative width
po.synth.metamer.plot_synthesis_status(met, width_ratios={'plot_representation_error': 3.1});
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/display.py:950: UserWarning: ax is not None, so we're ignoring figsize...
  warnings.warn("ax is not None, so we're ignoring figsize...")
_images/tutorials_models_Metamer-Portilla-Simoncelli_14_1.png
2.3 Portilla-Simoncelli Texture Model Metamers

This section will show a successful texture synthesis for this wicker basket texture:

[10]:
po.imshow(img);
_images/tutorials_models_Metamer-Portilla-Simoncelli_16_0.png

In the next block we will actually generate a metamer using the PortillaSimoncelli model, setting the following parameters for synthesis: max_iter, store_progress,coarse_to_fine, and coarse_to_fine_kwargs.

  • max_iter=1000 puts an upper bound (of 1000) on the number of iterations that the optimization will run.

  • store_progress=True tells the metamer class to store the progress of the metamer synthesis process

  • coarse_to_fine='together' activates the coarse_to_fine functionality. With this mode turned on the metamer synthesis optimizes the image for the statistics associated with the low spatial frequency bands first, adding subsequent bands after ctf_iters_to_check iterations.

It takes about 50s to run 100 iterations on my laptop. And it takes hundreds of iterations to get convergence. So you’ll have to wait a few minutes to generate the texture metamer.

Note: we initialize synthesis with im_init, an initial uniform noise image with range mean(target_signal)+[-.05,.05]. Initial images with uniform random noise covering the full pixel domain [0,1] (which is the default choice for Metamer) don’t result in the very best metamers. With the full range initial image, the optimization seems to get stuck.

[11]:
# send image and PS model to GPU, if available. then im_init and Metamer will also use GPU
img = img.to(DEVICE)
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
im_init = (torch.rand_like(img)-.5) * .1 + img.mean();

met = po.synth.MetamerCTF(img, model, loss_function=po.tools.optim.l2_norm, initial_image=im_init,
                          coarse_to_fine='together')

o=met.synthesize(
    max_iter=short_synth_max_iter,
    store_progress=True,
    # setting change_scale_criterion=None means that we change scales every ctf_iters_to_check,
    # see the metamer notebook for details.
    change_scale_criterion=None,
    ctf_iters_to_check=7
    )
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
  warnings.warn(
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:211: UserWarning: Validating whether model can work with coarse-to-fine synthesis -- this can take a while!
  warnings.warn("Validating whether model can work with coarse-to-fine synthesis -- this can take a while!")
 73%|███████▎  | 734/1000 [00:31<00:11, 23.05it/s, loss=8.7390e-02, learning_rate=0.01, gradient_norm=7.7326e-01, pixel_change_norm=1.6338e-01, current_scale=all, current_scale_loss=8.7390e-02]            /mnt/home/wbroderick/plenoptic/src/plenoptic/synthesize/metamer.py:661: UserWarning: Loss has converged, stopping synthesis
  warnings.warn("Loss has converged, stopping synthesis")
 73%|███████▎  | 734/1000 [00:31<00:11, 23.14it/s, loss=8.7390e-02, learning_rate=0.01, gradient_norm=7.7326e-01, pixel_change_norm=1.6338e-01, current_scale=all, current_scale_loss=8.7390e-02]

Now we can visualize the output of the synthesis optimization. First we compare the Target image and the Synthesized image side-by-side. We can see that they appear perceptually similar — that is, for this texture image, matching the Portilla-Simoncelli texture stats gives you an image that the human visual system also considers similar.

[12]:
po.imshow([met.image, met.metamer], title=['Target image', 'Synthesized metamer'], vrange='auto1');
_images/tutorials_models_Metamer-Portilla-Simoncelli_20_0.png

And to further visualize the result we can plot: the synthesized image, the synthesis loss over time, and the final model output error: model(target image) - model(synthesized image).

We can see the synthesized texture on the leftmost plot. The overall synthesis error decreases over the synthesis iterations (subplot 2). The remaining plots show us the error broken out by the different texture statistics that we will go over in the next section.

[13]:
po.synth.metamer.plot_synthesis_status(met, width_ratios={'plot_representation_error': 3.1});
_images/tutorials_models_Metamer-Portilla-Simoncelli_22_0.png
[14]:
# For the remainder of the notebook we will use this helper function to
# run synthesis so that the cells are a bit less busy.

# Be sure to run this cell.

def run_synthesis(img, model, im_init=None):
    r""" Performs synthesis with the full Portilla-Simoncelli model.

        Parameters
        ----------
        img : Tensor
            A tensor containing an img.
        model :
            A model to constrain synthesis.
        im_init: Tensor
            A tensor to start image synthesis.

        Returns
        -------
        met: Metamer
            Metamer from the full Portilla-Simoncelli Model

        """
    if im_init is None:
        im_init = torch.rand_like(img) * .01 + img.mean()
    met = po.synth.MetamerCTF(img, model, loss_function=po.tools.optim.l2_norm, initial_image=im_init,
                              coarse_to_fine='together')
    met.synthesize(
        max_iter=long_synth_max_iter,
        store_progress=True,
        change_scale_criterion=None,
        ctf_iters_to_check=3,
        )
    return met

3. The importance of different classes Texture Statistics

The Portilla-Simoncelli consists of a few different classes of statistics:

  • Marginal Statistics. These include pixel statistics (mean, variance, skew, kurtosis, and range of the pixel values), as well as the skewness and kurtosis of the lowpass images computed at each level of the recursive pyramid decomposition.

  • Auto-Correlation Statistics. These include the auto-correlation of the real-valued pyramid bands, as well as the auto-correlation of the magnitude of the pyramid bands, and the mean of the magnitude of the pyramid bands.

  • Cross-Correlation Statistics. These include correlations across scale and across orientation bands of the pyramid (both the for the real values of the pyramid and the magnitude of the pyramid bands).

The original paper uses synthesis to demonstrate the role of these different types of statistics. They show that the statistics can be used to constrain a synthesis optimization to generate new examples of textures. They also show that the absence of subsets of statistics results in synthesis failures. Here we replicate those results.

The first step is to create a version of the Portilla Simoncelli model where certain statistics can be turned off.

There are two important implementation details here, which you might be interested in if you’d like to write a similar extension of this model, and they both relate to coarse-to-fine synthesis. When removing statistics from the model, the most natural implementation would be to to remove them from the model’s representation, changing the shape of the returned tensor. However, in order for coarse-to-fine synthesis to work, we need to know which scale each statistic aligns with, and changing the shape destroys that mapping. Therefore, the proper way to remove statistics (in order to remain compatible with coarse-to-fine optimization) is to zero out those statistics instead: directly setting them to zero breaks the gradient so that they have no impact on the synthesis procedure. The second detail is that, during coarse-to-fine optimization, we must remove some set of statistics, which we do by calling the remove_scales method at the end of the function call. See the forward call below for an example of this

[15]:
#  The following class extends the PortillaSimoncelli model so that you can specify which
#  statistics you would like to remove.  We have created this model so that we can examine
#  the consequences of the absence of specific statistics.
#
#  Be sure to run this cell.

from collections import OrderedDict
class PortillaSimoncelliRemove(po.simul.PortillaSimoncelli):
    r"""Model for measuring a subset of texture statistics reported by PortillaSimoncelli

    Parameters
    ----------
    im_shape: int
        the size of the images being processed by the model
    remove_keys: list
        The dictionary keys for the statistics we will "remove".  In practice we set them to zero.
        Possible keys: ["pixel_statistics", "auto_correlation_magnitude",
        "skew_reconstructed", "kurtosis_reconstructed", "auto_correlation_reconstructed",
        "std_reconstructed", "magnitude_std", "cross_orientation_correlation_magnitude",
        "cross_scale_correlation_magnitude" "cross_scale_correlation_real", "var_highpass_residual"]
    """
    def __init__(
        self,
        im_shape,
        remove_keys,
    ):
        super().__init__(im_shape, n_scales=4, n_orientations=4, spatial_corr_width=9)
        self.remove_keys = remove_keys

    def forward(self, image, scales=None):
        r"""Generate Texture Statistics representation of an image with `remove_keys` removed.

        Parameters
        ----------
        image : torch.Tensor
            A tensor containing the image to analyze.
        scales : list, optional
            Which scales to include in the returned representation. If an empty
            list (the default), we include all scales. Otherwise, can contain
            subset of values present in this model's ``scales`` attribute.

        Returns
        -------
        representation: torch.Tensor
            3d tensor of shape (batch, channel, stats) containing the measured texture stats.

        """
        # create the representation tensor with (with all scales)
        stats_vec = super().forward(image)
        # convert to dict so it's easy to zero out the keys we don't care about
        stats_dict = self.convert_to_dict(stats_vec)
        for kk in self.remove_keys:
            # we zero out the stats (instead of removing them) because removing them
            # makes it difficult to keep track of which stats belong to which scale
            # (which is necessary for coarse-to-fine synthesis) -- see discussion above.
            if isinstance(stats_dict[kk],OrderedDict):
                for (key,val) in stats_dict[kk].items():
                    stats_dict[kk][key] *= 0
            else:
                stats_dict[kk] *= 0
        # then convert back to tensor and remove any scales we don't want (for coarse-to-fine)
        # -- see discussion above.
        stats_vec = self.convert_to_tensor(stats_dict)
        if scales is not None:
            stats_vec = self.remove_scales(stats_vec, scales)
        return stats_vec

Pixel Statistics + Marginal statistics

Beginning with some of the pixel and marginal statistics, we’ll demonstrate synthesis both with and without combinations of statistics.

The cell below replicates examples of synthesis failures with the following statistics removed:

  • the pixel statistics: mean, variance, skew, kurtosis, minimum, maximum and

  • marginal statistics on the lowpass images computed at each level of the recursive pyramid (skew, kurtosis)

These statistics play an important role constraining the histogram of pixel intensities to match across the original and synthesized image.

(see figure 3 of Portilla & Simoncelli 2000)

[16]:
# which statistics to remove
remove_statistics = ['pixel_statistics','skew_reconstructed','kurtosis_reconstructed']

# run on fig3a or fig3b to replicate paper
img = po.tools.load_images(DATA_PATH / 'fig3b.jpg').to(DEVICE)

# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)

# synthesis with pixel and marginal statistics absent
model_remove = PortillaSimoncelliRemove(img.shape[-2:] ,remove_keys=remove_statistics).to(DEVICE)
metamer_remove = run_synthesis(img, model_remove)
 12%|█▏        | 374/3000 [00:16<01:52, 23.32it/s, loss=2.0057e-01, learning_rate=0.01, gradient_norm=6.1037e-01, pixel_change_norm=2.7060e-01, current_scale=all, current_scale_loss=2.0057e-01]
 53%|█████▎    | 1577/3000 [01:11<01:04, 22.00it/s, loss=6.3014e-02, learning_rate=0.01, gradient_norm=8.9518e-01, pixel_change_norm=1.4009e-01, current_scale=all, current_scale_loss=6.3014e-02]

In the following figure, we can see that not only does the metamer created with all statistics look more like the target image than the one creaated without the marginal statistics, but its pixel intensity histogram is much more similar to that of the target image.

[17]:
# visualize results
fig = po.imshow([metamer.image, metamer.metamer, metamer_remove.metamer],
                title=['Target image', 'Full Statistics', 'Without Marginal Statistics'], vrange='auto1');
# add plots showing the different pixel intensity histograms
fig.add_axes([.33, -1, .33, .9])
fig.add_axes([.67, -1, .33, .9])
# this helper function expects a metamer object. see the metamer notebook for details.
po.synth.metamer.plot_pixel_values(metamer, ax=fig.axes[3])
fig.axes[3].set_title('Full statistics')
po.synth.metamer.plot_pixel_values(metamer_remove, ax=fig.axes[4])
fig.axes[4].set_title('Without marginal statistics')
[17]:
Text(0.5, 1.0, 'Without marginal statistics')
_images/tutorials_models_Metamer-Portilla-Simoncelli_29_1.png
Coefficient Correlations

The cell below replicates examples of synthesis failures with the following statistics removed:

  • local auto-correlations of the lowpass images computed at each level of the recursive pyramid

These statistics play a role in representing periodic structures and long-range correlations. For example, in the image named fig4b.jpg (the tile pattern) the absence of these statistics causes results in more difficulty synthesizing the long, continuous lines that stretch from one end of the image to the other.

(see figure 4 of Portilla & Simoncelli 2000)

[18]:
# which statistics to remove. note that, in the original paper, std_reconstructed is implicitly contained within
# auto_correlation_reconstructed, view the section on differences between plenoptic and matlab implementation
# for details
remove_statistics = ['auto_correlation_reconstructed', 'std_reconstructed']

# run on fig4a or fig4b to replicate paper
img = po.tools.load_images(DATA_PATH / 'fig4b.jpg').to(DEVICE)

# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)

# synthesis with coefficient correlations  absent
model_remove = PortillaSimoncelliRemove(img.shape[-2:], remove_keys=remove_statistics).to(DEVICE)
metamer_remove = run_synthesis(img, model_remove)
100%|██████████| 3000/3000 [02:09<00:00, 23.22it/s, loss=1.0762e-01, learning_rate=0.01, gradient_norm=6.9003e-01, pixel_change_norm=1.5595e-01, current_scale=all, current_scale_loss=1.0762e-01]
100%|██████████| 3000/3000 [02:18<00:00, 21.69it/s, loss=9.2050e-01, learning_rate=0.01, gradient_norm=9.2850e-03, pixel_change_norm=1.7451e-02, current_scale=all, current_scale_loss=9.2050e-01]
[19]:
# visualize results
po.imshow([metamer.image, metamer.metamer, metamer_remove.metamer],
          title=['Target image', 'Full Statistics', 'Without Correlation Statistics'], vrange='auto1');
_images/tutorials_models_Metamer-Portilla-Simoncelli_32_0.png

And we can double check the error plots to see the difference in their representations. The first figure shows the error for the metamer created without the correlation statistics (at right above), while the second shows the error for the metamer created with all statistics (center), and we can see that larger error in the first plot in the middle row in the first figure, especially the center plot, auto_correlation_reconstructed, since these statistics are unconstrained for the synthesis done by metamer_remove. (Note we have to use model, not model_remove to create these plots, since model_remove always zeroes out those statistics.)

[20]:
fig, _ = model.plot_representation(model(metamer_remove.metamer) - model(metamer.image),
                                   figsize=(15, 5), ylim=(-4, 4))
fig.suptitle('Without Correlation Statistics')

fig, _ = model.plot_representation(model(metamer.metamer) - model(metamer.image),
                                   figsize=(15, 5), ylim=(-4, 4))
fig.suptitle('Full statistics');
_images/tutorials_models_Metamer-Portilla-Simoncelli_34_0.png
_images/tutorials_models_Metamer-Portilla-Simoncelli_34_1.png
Magnitude Correlation

The cell below replicates examples of synthesis failures with the following statistics removed:

  • correlation of the complex magnitude of pairs of coefficients at adjacent positions, orientations and scales.

These statistics play a role constraining high contrast locations to be organized along lines and edges across all scales. For example, in the image named fig6a.jpg the absence of these statistics results in a completely different organization of the orientation content in the edges.

(see figure 6 of Portilla & Simoncelli 2000)

[21]:
# which statistics to remove. note that, in the original paper, magnitude_std is implicitly contained within
# auto_correlation_magnitude, view the section on differences between plenoptic and matlab implementation
# for details
remove_statistics = ['magnitude_std', 'cross_orientation_correlation_magnitude',
                     'cross_scale_correlation_magnitude', 'auto_correlation_magnitude']

# run on fig6a or fig6b to replicate paper
img = po.tools.load_images(DATA_PATH / 'fig6a.jpg').to(DEVICE)

# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)

# synthesis with pixel and marginal statistics absent
model_remove = PortillaSimoncelliRemove(img.shape[-2:],remove_keys=remove_statistics).to(DEVICE)
metamer_remove = run_synthesis(img, model_remove)
 17%|█▋        | 522/3000 [00:22<01:47, 22.97it/s, loss=9.1164e-02, learning_rate=0.01, gradient_norm=8.2437e-01, pixel_change_norm=1.5844e-01, current_scale=all, current_scale_loss=9.1164e-02]
 16%|█▌        | 479/3000 [00:22<01:56, 21.60it/s, loss=7.1354e-02, learning_rate=0.01, gradient_norm=9.4536e-01, pixel_change_norm=1.4267e-01, current_scale=all, current_scale_loss=7.1354e-02]
[22]:
# visualize results
po.imshow([metamer.image, metamer.metamer, metamer_remove.metamer],
          title=['Target image', 'Full Statistics','Without Magnitude Statistics'], vrange='auto1');
_images/tutorials_models_Metamer-Portilla-Simoncelli_37_0.png

And again, let’s look at the error plots. The first figure shows the error for the metamer created without the correlation statistics (at right above), while the second shows the error for the metamer created with all statistics (center), and we can see that larger error in the plot scorresponding to auto_correlation_magnitude, cross_orientation_correlation_magnitude, and cross_scale_correlation_magnitude., since these statistics are unconstrained for the synthesis done by metamer_remove. (Note we have to use model, not model_remove to create these plots, since model_remove always zeroes out those statistics.)

[23]:
fig, _ = model.plot_representation(model(metamer_remove.metamer) - model(metamer.image),
                                   figsize=(15, 5), ylim=(-2, 2))
fig.suptitle('Without Correlation Statistics')

fig, _ = model.plot_representation(model(metamer.metamer) - model(metamer.image),
                                   figsize=(15, 5), ylim=(-2, 2))
fig.suptitle('Full statistics');
_images/tutorials_models_Metamer-Portilla-Simoncelli_39_0.png
_images/tutorials_models_Metamer-Portilla-Simoncelli_39_1.png
Cross-scale Phase Statistics

The cell below replicates examples of synthesis failures with the following statistics removed:

  • relative phase of coefficients of bands at adjacent scales

These statistics play a role constraining high contrast locations to be organized along lines and edges across all scales. These phase statistics are important in representing textures with strong illumination effects. When they are removed, the synthesized images appear much less three dimensional and lose the detailed structure of shadows.

(see figure 8 of Portilla & Simoncelli 2000)

[24]:
# which statistics to remove
remove_statistics = ['cross_scale_correlation_real']

# run on fig8a and fig8b to replicate paper
img = po.tools.load_images(DATA_PATH / 'fig8b.jpg').to(DEVICE)

# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)

# synthesis with pixel and marginal statistics absent
model_remove = PortillaSimoncelliRemove(img.shape[-2:], remove_keys=remove_statistics).to(DEVICE)
metamer_remove = run_synthesis(img, model_remove)
 16%|█▌        | 482/3000 [00:20<01:48, 23.24it/s, loss=7.3351e-02, learning_rate=0.01, gradient_norm=8.6994e-01, pixel_change_norm=1.5538e-01, current_scale=all, current_scale_loss=7.3351e-02]
 17%|█▋        | 512/3000 [00:23<01:53, 21.87it/s, loss=7.2080e-02, learning_rate=0.01, gradient_norm=8.8912e-01, pixel_change_norm=1.5535e-01, current_scale=all, current_scale_loss=7.2080e-02]
[25]:
# visualize results
po.imshow([metamer.image, metamer.metamer, metamer_remove.metamer],
          title=['Target image', 'Full Statistics','Without Cross-Scale Phase Statistics'], vrange='auto1');
_images/tutorials_models_Metamer-Portilla-Simoncelli_42_0.png

And again, let’s look at the error plots. The first figure shows the error for the metamer created without the correlation statistics (at right above), while the second shows the error for the metamer created with all statistics (center), and we can see that larger error in the final plot in the first figure, cross_scale_correlation_real, since these statistics are unconstrained for the synthesis done by metamer_remove. (Note we have to use model, not model_remove to create these plots, since model_remove always zeroes out those statistics.)

[26]:
fig, _ = model.plot_representation(model(metamer_remove.metamer) - model(metamer.image),
                                   figsize=(15, 5), ylim=(-1.2, 1.2))
fig.suptitle('Without Correlation Statistics')

fig, _ = model.plot_representation(model(metamer.metamer) - model(metamer.image),
                                   figsize=(15, 5), ylim=(-1.2, 1.2))
fig.suptitle('Full statistics');
_images/tutorials_models_Metamer-Portilla-Simoncelli_44_0.png
_images/tutorials_models_Metamer-Portilla-Simoncelli_44_1.png

4. Examples from different texture classes

Hand-drawn / computer-generated textures

(see figure 12 of Portilla Simoncelli 2000)

The following cell can be used to reproduce texture synthesis on the hand-drawn / computer-generated texture examples in the original paper, showing that the model can handle these simpler images as well.

Examples

  • (12a) solid black squares

  • (12b) tilted gray columns

  • (12c) curvy lines

  • (12d) dashes

  • (12e) solid black circles

  • (12f) pluses

[27]:
img = po.tools.load_images(DATA_PATH / 'fig12a.jpg').to(DEVICE)

# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img,model)
100%|██████████| 3000/3000 [02:09<00:00, 23.15it/s, loss=2.9268e+00, learning_rate=0.01, gradient_norm=4.8896e-01, pixel_change_norm=1.2103e-01, current_scale=all, current_scale_loss=2.9268e+00]
[28]:
po.imshow([metamer.image, metamer.metamer],
          title=['Target image', 'Synthesized Metamer'], vrange='auto1');
_images/tutorials_models_Metamer-Portilla-Simoncelli_47_0.png
Counterexample to the Julesz Conjecture

The Julesz conjecture, originally from Julesz 1962, states that “humans cannot distinguish between textures with identical second-order statistics” (second-order statistics include cross- and auto-correlations, see paper for details). Following up on this initial paper, Julesz et al, 1978 and then Yellot, 1993 created images that served as counter-examples for this conjecture: pairs of images that had identical second-order statistics (they differed in their third- and higher-order statistics) but were readily distinguishable by humans. In figure 13 of Portilla & Simoncelli, 2000, the authors show that the model is able to synthesize novel images based on these counterexamples that are also distinguishbale by humans, so the model does not confuse them either.

(see figure 13 of Portilla & Simoncelli 2000)

Excerpt from paper: “Figure 13 shows two pairs of counterexamples that have been used to refute the Julesz conjecture. [13a and 13b were ] originally created by Julesz et al. (1978): they have identical third-order pixel statistics, but are easily discriminated by human observers. Our model succeeds, in that it can reproduce the visual appearance of either of these textures. In particular, we have seen that the strongest statistical difference arises in the magnitude correlation statistcs. The rightmost pair were constructed by Yellott (1993), to have identical sample autocorrelation. Again, our model does not confuse these, and can reproduce the visual appearance of either one.”

[29]:
# Run on fig13a, fig13b, fig13c, fig13d to replicate examples in paper
img = po.tools.load_images(DATA_PATH / 'fig13a.jpg').to(DEVICE)

# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer_left = run_synthesis(img,model)
100%|██████████| 3000/3000 [02:10<00:00, 23.02it/s, loss=3.9404e-01, learning_rate=0.01, gradient_norm=2.6782e-02, pixel_change_norm=4.4524e-02, current_scale=all, current_scale_loss=3.9404e-01]
[30]:
# Run on fig13a, fig13b, fig13c, fig13d to replicate examples in paper
img = po.tools.load_images(DATA_PATH / 'fig13b.jpg').to(DEVICE)

# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer_right = run_synthesis(img,model)
 62%|██████▏   | 1860/3000 [01:20<00:49, 23.07it/s, loss=3.2113e-01, learning_rate=0.01, gradient_norm=1.8246e-01, pixel_change_norm=1.2679e-01, current_scale=all, current_scale_loss=3.2113e-01]

And note that the two synthesized images (right column) or as distinguishable from each other as the two hand-crafted counterexamples (left column):

[31]:
po.imshow([metamer_left.image, metamer_left.metamer,
           metamer_right.image, metamer_right.metamer],
          title=['Target image 1', 'Synthesized Metamer 1', 'Target Image 2', 'Synthesized Metamer 2'],
          vrange='auto1', col_wrap=2);
_images/tutorials_models_Metamer-Portilla-Simoncelli_52_0.png
Pseudo-periodic Textures

(see figure 14 of Portilla & Simoncelli 2000)

Excerpt from paper: “Figure 14 shows synthesis results photographic textures that are pseudo-periodic, such as a brick wall and various types of woven fabric”

[32]:
# Run on fig14a, fig14b, fig14c, fig14d, fig14e, fig14f to replicate examples in paper
img = po.tools.load_images(DATA_PATH / 'fig14a.jpg').to(DEVICE)

# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img,model)
 18%|█▊        | 550/3000 [00:23<01:45, 23.13it/s, loss=2.3135e-01, learning_rate=0.01, gradient_norm=5.0994e-01, pixel_change_norm=2.7653e-01, current_scale=all, current_scale_loss=2.3135e-01]
[33]:
po.imshow([metamer.image, metamer.metamer],
          title=['Target image', 'Synthesized Metamer'], vrange='auto1');
_images/tutorials_models_Metamer-Portilla-Simoncelli_55_0.png
Aperiodic Textures

(see figure 15 of Portilla & Simoncelli 2000)

Excerpt from paper: “Figure 15 shows synthesis results for a set of photographic textures that are aperiodic, such as the animal fur or wood grain”

[34]:
# Run on fig15a, fig15b, fig15c, fig15d to replicate examples in paper
img = po.tools.load_images(DATA_PATH / 'fig15a.jpg').to(DEVICE)

# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img,model)
 14%|█▍        | 425/3000 [00:18<01:51, 23.09it/s, loss=9.6799e-02, learning_rate=0.01, gradient_norm=8.5685e-01, pixel_change_norm=1.7662e-01, current_scale=all, current_scale_loss=9.6799e-02]
[35]:
po.imshow([metamer.image, metamer.metamer],
          title=['Target image', 'Synthesized Metamer'], vrange='auto1');
_images/tutorials_models_Metamer-Portilla-Simoncelli_58_0.png
Complex Structured Photographic Textures

(see figure 16 of Portilla & Simoncelli 2000)

Excerpt from paper: “Figure 16 shows several examples of textures with complex structures. Although the synthesis quality is not as good as in previous examples, we find the ability of our model to capture salient visual features of these textures quite remarkable. Especially notable are those examples in all three figures for which shading produces a strong impression of three-dimensionality.”

[36]:
# Run on fig16a, fig16b, fig16c, fig16d to replicate examples in paper
img = po.tools.load_images(DATA_PATH / 'fig16e.jpg').to(DEVICE)

# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)
 14%|█▎        | 412/3000 [00:17<01:52, 22.97it/s, loss=7.4121e-02, learning_rate=0.01, gradient_norm=1.2208e+00, pixel_change_norm=1.4139e-01, current_scale=all, current_scale_loss=7.4121e-02]
[37]:
po.imshow([metamer.image, metamer.metamer],
          title=['Target image', 'Synthesized metamer'], vrange='auto1');
_images/tutorials_models_Metamer-Portilla-Simoncelli_61_0.png

5. Extrapolation

(see figure 19 of Portilla & Simoncelli 2000)

Here we explore using the texture synthesis model for extrapolating beyond its spatial boundaries.

Excerpt from paper: “…[C]onsider the problem of extending a texture image beyond its spatial boundaries (spatial extrapolation). We want to synthesize an image in which the central pixels contain a copy of the original image, and the surrounding pixels are synthesized based on the statistical measurements of the original image. The set of all images with the same central subset of pixels is convex, and the projection onto such a convex set is easily inserted into the iterative loop of the synthesis algorithm. Specifically, we need only re-set the central pixels to the desired values on each iteration of the synthesis loop. In practice, this substitution is done by multiplying the desired pixels by a smooth mask (a raised cosine) and adding this to the current synthesized image multiplied by the complement of this mask. The smooth mask prevents artifacts at the boundary between original and synthesized pixels, whereas convergence to the desired pixels within the mask support region is achieved almost perfectly. This technique is applicable to the restoration of pictures which have been destroyed in some subregion (“filling holes”) (e.g., Hirani and Totsuka, 1996), although the estimation of parameters from the defective image is not straightforward. Figure 19 shows a set of examples that have been spatially extrapolated using this method. Observe that the border between real and synthetic data is barely noticeable. An additional potential benefit is that the synthetic images are seamlessly periodic (due to circular boundary-handling within our algorithm), and thus may be used to tile a larger image.”

In the following, we mask out the boundaries of an image and use the texture model to extend it.

[38]:
# The following class inherits from the PortillaSimoncelli model for
# the purpose of extrapolating (filling in) a chunk of an imaged defined
# by a mask.

class PortillaSimoncelliMask(po.simul.PortillaSimoncelli):
    r"""Extend the PortillaSimoncelli model to operate on masked images.

    Additional Parameters
    ----------
    mask: Tensor
        boolean mask with True in the part of the image that will be filled in during synthesis
    target: Tensor
        image target for synthesis

    """
    def __init__(
        self,
        im_shape,
        n_scales=4,
        n_orientations=4,
        spatial_corr_width=9,
        mask=None,
        target=None
    ):
        super().__init__(im_shape, n_scales=4, n_orientations=4, spatial_corr_width=9)
        self.mask = mask;
        self.target = target;

    def forward(self, image, scales=None):
        r"""Generate Texture Statistics representation of an image using the target for the masked portion

        Parameters
        ----------
        images : torch.Tensor
            A 4d tensor containing two images to analyze, with shape (2,
            channel, height, width).
        scales : list, optional
            Which scales to include in the returned representation. If an empty
            list (the default), we include all scales. Otherwise, can contain
            subset of values present in this model's ``scales`` attribute.

        Returns
        -------
        representation_tensor: torch.Tensor
            3d tensor of shape (batch, channel, stats) containing the measured
            texture statistics.

        """
        if self.mask is not None and self.target is not None:
            image = self.texture_masked_image(image)

        return super().forward(image,scales=scales)

    def texture_masked_image(self,image):
        r""" Fill in part of the image (designated by the mask) with the saved target image

        Parameters
        ------------
        image : torch.Tensor
            A tensor containing a single image

        Returns
        -------
        texture_masked_image: torch.Tensor
            An image that is a combination of the input image and the saved target.
            Combination is specified by self.mask

        """
        return self.target*self.mask + image*(~self.mask)
[39]:
img_file = DATA_PATH / 'fig14b.jpg'
img = po.tools.load_images(img_file).to(DEVICE)
im_init = (torch.rand_like(img)-.5) * .1 + img.mean();

mask = torch.zeros(1,1,256,256).bool().to(DEVICE)
ctr_dim = (img.shape[-2]//4, img.shape[-1]//4)
mask[...,ctr_dim[0]:3*ctr_dim[0],ctr_dim[1]:3*ctr_dim[1]] = True

model = PortillaSimoncelliMask(img.shape[-2:], target=img, mask=mask).to(DEVICE)
met = po.synth.MetamerCTF(img, model, loss_function=po.tools.optim.l2_norm, initial_image=im_init,
                          coarse_to_fine='together')

optimizer = torch.optim.Adam([met.metamer],lr=.02, amsgrad=True)

met.synthesize(
    optimizer=optimizer,
    max_iter=short_synth_max_iter,
    store_progress=True,
    change_scale_criterion=None,
    ctf_iters_to_check=3
    )
 83%|████████▎ | 830/1000 [00:35<00:07, 23.10it/s, loss=1.5536e-01, learning_rate=0.02, gradient_norm=1.0073e+00, pixel_change_norm=3.0407e-01, current_scale=all, current_scale_loss=1.5536e-01]
[40]:
po.imshow([met.image, mask*met.image, model.texture_masked_image(met.metamer)], vrange='auto1',
          title=['Full target image', 'Masked target' ,'synthesized image']);
_images/tutorials_models_Metamer-Portilla-Simoncelli_65_0.png
5.2 Mixtures

Here we explore creating a texture that is “in between” two textures by averaging their texture statistics and synthesizing an image that matches those average statistics.

Note that we do this differently than what is described in the paper. In the original paper, mixed statistics were computed by calculating the statistics on a single input image that consisted of half of each of two texture images pasted together. This led to an “oil and water” appearance in the resulting texture metamer, which appeared to have patches from each image.

In the following, we compute the texture statistics on two texture images separately and then average the resulting statistics, which appears to perform better. Note that, in all the other examples in this notebook, we knew there exists at least one image whose output matches our optimization target: the image we started with. For these mixtures, that is no longer the case.

[41]:
# The following classes are designed to extend the PortillaSimoncelli model
# and the Metamer synthesis method for the purpose of mixing two target textures.

class PortillaSimoncelliMixture(po.simul.PortillaSimoncelli):
    r"""Extend the PortillaSimoncelli model to mix two different images

        Parameters
        ----------
        im_shape: int
            the size of the images being processed by the model

    """
    def __init__(
        self,
        im_shape,
    ):
        super().__init__(im_shape, n_scales=4, n_orientations=4, spatial_corr_width=9)


    def forward(self, images, scales=None):
        r"""Average Texture Statistics representations of two image

        Parameters
        ----------
        images : torch.Tensor
            A 4d tensor containing one or two images to analyze, with shape (i,
            channel, height, width), i in {1,2}.
        scales : list, optional
            Which scales to include in the returned representation. If an empty
            list (the default), we include all scales. Otherwise, can contain
            subset of values present in this model's ``scales`` attribute.

        Returns
        -------
        representation_tensor: torch.Tensor
            3d tensor of shape (batch, channel, stats) containing the measured
            texture statistics.

        """
        if images.shape[0] == 2:
            # need the images to be 4d, so we use the "1 element slice"
            stats0 = super().forward(images[:1], scales=scales)
            stats1 = super().forward(images[1:2], scales=scales)
            return (stats0+stats1)/2
        else:
            return super().forward(images, scales=scales)

class MetamerMixture(po.synth.MetamerCTF):
    r""" Extending metamer synthesis based on image-computable
    differentiable models, for mixing two images.
    """
    def _initialize(self, initial_image):
        """Initialize the metamer.

        Set the ``self.metamer`` attribute to be a parameter with
        the user-supplied data, making sure it's the right shape.

        Parameters
        ----------
        initial_image :
            The tensor we use to initialize the metamer. If None (the
            default), we initialize with uniformly-distributed random
            noise lying between 0 and 1.

        """
        if initial_image.ndimension() < 4:
            raise Exception("initial_image must be torch.Size([n_batch"
                            ", n_channels, im_height, im_width]) but got "
                            f"{initial_image.size()}")
        # the difference between this and the regular version of Metamer is that
        # the regular version requires synthesized_signal and target_signal to have
        # the same shape, and here target_signal is (2, 1, 256, 256), not (1, 1, 256, 256)
        metamer = initial_image.clone().detach()
        metamer = metamer.to(dtype=self.image.dtype,
                             device=self.image.device)
        metamer.requires_grad_()
        self._metamer = metamer
[42]:
# Figure 20. Examples of “mixture” textures.
# To replicate paper use the following combinations:
# (Fig. 15a, Fig. 15b); (Fig. 14b, Fig. 4a); (Fig. 15e, Fig. 14e).

img_files = [DATA_PATH / 'fig15e.jpg', DATA_PATH / 'fig14e.jpg']
imgs = po.tools.load_images(img_files).to(DEVICE)
im_init = torch.rand_like(imgs[0,:,:,:].unsqueeze(0)) * .01 + imgs.mean()
n=imgs.shape[-1]

model = PortillaSimoncelliMixture([n,n]).to(DEVICE)
met = MetamerMixture(imgs, model, loss_function=po.tools.optim.l2_norm, initial_image=im_init,
                     coarse_to_fine='together')

optimizer = torch.optim.Adam([met.metamer],lr=.02, amsgrad=True)

met.synthesize(
    optimizer=optimizer,
    max_iter=longest_synth_max_iter,
    store_progress=True,
    change_scale_criterion=None,
    ctf_iters_to_check=3
    )
 21%|██        | 829/4000 [00:35<02:17, 23.05it/s, loss=3.0252e-01, learning_rate=0.02, gradient_norm=4.1979e-01, pixel_change_norm=2.6349e-01, current_scale=all, current_scale_loss=3.0252e-01]
[43]:
po.imshow([met.image, met.metamer], vrange='auto1',title=['Target image 1', 'Target image 2', 'Synthesized Mixture Metamer']);
_images/tutorials_models_Metamer-Portilla-Simoncelli_69_0.png

6. Model Limitations

Not all texture model metamers look perceptually similar to humans. The paper’s figures 17 and 18 present two classes of failures: “inhomogeneous texture images not usually considered to be ‘texture’” (such as human faces, fig. 17) and some simple hand-drawn textures (fig. 18), many of which are simple geometric line drawings.

Note that for these examples, we were unable to locate the original images, so we present examples that serve the same purpose.

[44]:
img = po.data.einstein().to(DEVICE)

# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)
  8%|▊         | 249/3000 [00:10<02:01, 22.67it/s, loss=1.0463e-01, learning_rate=0.01, gradient_norm=7.2169e-01, pixel_change_norm=1.5591e-01, current_scale=all, current_scale_loss=1.0463e-01]

Here we can see that the texture model fails to capture anything that makes this image look “portrait-like”: there is no recognizable face or clothes in the synthesized metamer. As a portrait is generally not considered a texture, this is not a model failure per se, but does demonstrate the limits of this model.

[45]:
po.imshow([metamer.image, metamer.metamer],
          title=['Target image', 'Synthesized Metamer'], vrange='auto1');
_images/tutorials_models_Metamer-Portilla-Simoncelli_73_0.png

In this example, we see the model metamer fails to reproduce the randomly distributed oriented black lines on a white background: in particular, several lines are curved and several appear discontinuous. From the paper: “Althought a texture of single-orientation bars is reproduced fairly well (see Fig. 12), the mixture of bar orientations in this example leads ot the synthesis of curved line segments. In general, the model is unable to distinguish straight from curved contours, except when the contours are all of the same orientation.”

[46]:
img = po.tools.load_images(DATA_PATH / 'fig18a.png').to(DEVICE)

# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)
 46%|████▌     | 1366/3000 [00:59<01:10, 23.05it/s, loss=2.0882e-01, learning_rate=0.01, gradient_norm=2.2590e-01, pixel_change_norm=8.9952e-02, current_scale=all, current_scale_loss=2.0882e-01]
[47]:
po.imshow([metamer.image, metamer.metamer],
          title=['Target image', 'Synthesized Metamer'], vrange='auto1');
_images/tutorials_models_Metamer-Portilla-Simoncelli_76_0.png

7. Notable differences between Matlab and Plenoptic Implementations:

  1. Optimization. The matlab implementation of texture synthesis is designed specifically for the texture model. Gradient descent is performed on subsets of the texture statistics in a particular sequence (coarse-to-fine, etc.). The plenoptic implementation relies on the auto-differentiation and optimization tools available in pytorch. We only define the forward model and then allow pytorch to handle the optimization.

    Why does this matter? We have qualitatively reproduced the results but cannot guarantee exact reproducibility. This is true in general for the plenoptic package: https://plenoptic.readthedocs.io/en/latest/reproducibility.html. This means that, in general, metamers synthesized by the two versions will differ.

  2. Lack of redundant statistics. As described in the next section, we output a different number of statistics than the Matlab implementation. The number of statistics returned in plenoptic matches the number of statistics reported in the paper, unlike the Matlab implementation. That is because the Matlab implementation included many redundant statistics, which were either exactly redundant (e.g., symmetric values in an auto-correlation matrix), placeholders (e.g., some 0s to make the shapes of the output work out), or not mentioned in the paper. The implementation included in plenoptic returns only the necessary statistics. See the next section for more details.

  3. True correlations. In the Matlab implementation of Portilla Simoncelli statistics, the auto-correlation, cross-scale and cross-orientation statistics are based on co-variance matrices. When using torch to perform optimization, this makes convergence more difficult. We thus normalize each of these matrices, dividing the auto-correlation matrices by their center values (the variance) and the cross-correlation matrices by the square root of the product of the appropriate variances (so that we match numpy.corrcoef). This means that the center of the auto-correlations and the diagonals of cross_orientation_correlation_magnitude are always 1 and are thus excluded from the representation, as discussed above. We have thus added two new statistics, std_reconstructed and magnitude_std (the standard deviation of the reconstructed lowpass images and the standard deviation of the magnitudes of each steerable pyramid band), to compensate (see Note at end of cell). Note that the cross-scale correlations have no redundancies and do not have 1 along the diagonal. For the cross_orientation_correlation_magnitude, the value at \(A_{i,j}\) is the correlation between the magnitudes at orientation \(i\) and orientation \(j\) at the same scale, so that \(A_{i,i}\) is the correlation of a magnitude band with itself, i.e., \(1\). However, for cross_scale_correlation_magnitude, the value at \(A_{i,j}\) is the correlation between the magnitudes at orientation \(i\) and orientation \(j\) at two adjacent scales, and thus \(A_{i,i}\) is not the correlation of a band with itself; it is thus informative.

Note: We use standard deviations, instead of variances, because the value of the standard deviations lie within approximately the same range as the other values in the model’s representation, which makes optimization work better.

7.1 Redundant statistics

The original Portilla-Simoncelli paper presents formulas to obtain the number of statistics in each class from the model parameters n_scales, n_orientations and spatial_corr_width (labeled in the original paper \(N\), \(K\), and \(M\) respectively). The formulas indicate the following statistics for each class:

  • Marginal statistics: \(2(N+1)\) skewness and kurtosis of lowpass images, \(1\) high-pass variance, \(6\) pixel statistics.

  • Raw coefficient correlation: \((N+1)\frac{M^2+1}{2}\) statistics (\(\frac{M^2+1}{2}\) auto-correlations for each scale including lowpass)

  • Coefficient magnitude statistics: \(NK\frac{M^2+1}{2}\) autocorrelation statistics, \(N\frac{K(K-1)}{2}\) cross-orientation correlations at same scale, \(K^2(N-1)\) cross-scale correlations.

  • Cross-scale phase statistics: \(2K^2(N-1)\) statistics

In particular, the paper reads “For our texture examples, we have made choices of N = 4, K = 4 and M = 7, resulting in a total of 710 parameters”. However, the output of the Portilla-Simoncelli code in Matlab contains 1784 elements for these values of \(N\), \(K\) and \(M\). The discrepancy is because the Matlab output includes redundant statistics, placeholder values, and statistics not used during synthesis. The plenoptic output on the other hand returns only the essential statistics, and its output is in agreement with the papers formulas.

The redundant statistics that are removed by the plenoptic package but that are present in the Matlab code are as follows:

  1. Auto-correlation reconstructed: An auto-covariance matrix \(A\) encodes the covariance of the elements in a signal and their neighbors. Indexing the central auto-covariance element as \(A_{0,0}\), element \(A_{i,j}\) contains the covariance of the signal with it’s neighbor at a displacement \(i,j\). Because auto-correlation matrices are even functions, they have a symmetry where \(A_{i,j}=A_{-i,-j}\) which means that every element except the central one (\(A_{0,0}\), the variance) is duplicated (see Note at end of cell). Thus, in an autocorrelation matrix of size \(M \times M\), there are \(\frac{M^2+1}{2}\) non-redundant elements (see this ratio appear in the auto-correlation statistics formulas above). The Matlab code returns the full auto-covariance matrices, that is, \(M^2\) instead of \(\frac{M^2+1}{2}\) elements for each covariance matrix.

  2. Auto-correlation magnitude: Same symmetry and redundancies as 1).

  3. Cross-orientation magnitude correlation: Covariance matrices \(C\) (size \(K \times K\)) have symmetry \(C_{i,j} = C_{j,i}\) (each off-diagonal element is duplicated, i.e., they’re symmetric). Thus, a \(K \times K\) covariance matrix has \(\frac{K(K+1)}{2}\) non-redundant elements. However, the diagonal elements of the cross-orientation correlations are variances, which are already contained in the central elements of the auto-correlation magnitude matrices. Thus, these covariances only hold \(\frac{K(K-1)}{2}\) non-redundant elements (see this term in the formulas above). The Matlab code returns the full covariances (with \(K^2\) elements) instead of the non-redundant ones. Also, the Matlab code returns an extra covariance matrix full of 0’s not mentioned in the paper (\((N+1)\) matrices instead of \((N)\)).

  4. Cross-scale real correlation (phase statistics): Phase statistics contain the correlations between the \(K\) real orientations at a scale with the \(2K\) real and imaginary phase-doubled orientations at the following scale, making a total of \(K \times 2K=2K^2\) statistics (see this term in the formulas above). However, the Matlab output has matrices of size \(2K \times 2K\), where half of the matrices are filled with 0’s. Also, the paper counts the \((N-1)\) pairs of adjacent scales, but the Matlab output includes \(N\) matrices. The plenoptic output removes the 0’s and the extra matrix.

  5. Statistics not in paper: The Matlab code outputs the mean magnitude of each band and cross-orientation real correlations, but these are not enumerated in the paper. These statistics are removed in plenoptic. See the next section for some more detail about the magnitude means.

Note: This can be understood by thinking of \(A_{i,0}\), the autocorrelation of every pixel and the pixel \(i\) to their right. Computing this auto-covariance involves adding together all the products \(I_{x,y}*I_{x+i,y}\) for every x and y in the image. But this is equivalent to computing \(A_{-i,0}\), because every pair of two neighbors \(i\) to the right \(I_{x,y}*I_{x+i,y}\) is also a pair of neighbors \(i\) to the left, \(I_{x+i,y}*I_{(x+i)-i,y}=I_{x+i,y}*I_{x,y}\). So, any opposite displacements around the central element in the auto-covariance matrix will have the same value.

As shown below, the output of plenoptic matches the number of statistics indicated in the paper:

[48]:
img = po.tools.load_images(DATA_PATH / 'fig4a.jpg')
image_shape = img.shape[2:4]

# Initialize the minimal model. Use same params as paper
model = po.simul.PortillaSimoncelli(image_shape, n_scales=4,
                            n_orientations=4,
                            spatial_corr_width=7)

stats = model(img)

print(f'Stats for N=4, K=4, M=7: {stats[0].shape[1]} statistics')
Stats for N=4, K=4, M=7: 710 statistics

plenoptic allows to convert the tensor of statistics into a dictionary containing matrices, similar to the Matlab output. In this dictionary, the redundant statistics are indicated with NaNs. We print one of the auto-correlation matrices showing the redundant elements it contains:

[49]:
stats_dict = model.convert_to_dict(stats)
s = 1
o = 2
print(stats_dict['auto_correlation_magnitude'][0,0,:,:,s,o])
tensor([[0.1396,    nan,    nan,    nan,    nan,    nan,    nan],
        [0.2411, 0.3492,    nan,    nan,    nan,    nan,    nan],
        [0.3750, 0.5434, 0.7396,    nan,    nan,    nan,    nan],
        [0.4501, 0.6598, 0.8886,    nan,    nan,    nan,    nan],
        [0.3909, 0.5783, 0.7708, 0.8490,    nan,    nan,    nan],
        [0.2488, 0.3786, 0.5111, 0.5619, 0.4833,    nan,    nan],
        [0.1404, 0.2305, 0.3287, 0.3715, 0.3175, 0.2189,    nan]])

We see in the output above that both the upper triangular part of the matrix, and the diagonal elements from the center onwards are redundant, as indicated in the text above. Note that although the central element is not redundant in auto-covariance matrices, when the covariances are converted to correlations, the central element is 1, and so uninformative (see previous section for more information).

We can count how many statistics are in this particular class:

[50]:
acm_not_redundant = torch.sum(~torch.isnan(stats_dict['auto_correlation_magnitude']))
print(f'Non-redundant elements in acm: {acm_not_redundant}')
Non-redundant elements in acm: 384

The number of non redundant elements is 16 elements short of the \(NK\frac{M^2+1}{2} = 4\cdot 4 \cdot \frac{7^2+1}{2}=400\) statistics indicated by the formula. This is because plenoptic removes the central elements of these matrices and holds them in stats_dict['magnitude_std']:

[51]:
print(f"Number magnitude band variances: {stats_dict['magnitude_std'].numel()}")
Number magnitude band variances: 16

Next, lets see whether the number of statistics in each class match what is in the original paper:

  1. Marginal statistics: Total of 17 statistics

    • kurtosis + skewness: 2*(N+1) = 2*(4+1) = 10

    • variance of high pass band: 1

    • pixel statistics: 6

  2. Raw coefficient correlation: Total of 125 statistics

    • Central samples of auto-correlation reconstructed: (N+1)*(M^2+1)/2 = (4+1)*(7^2+1)/2 = 125

  3. Coefficient magnitude statistics: Total of 472 statistics

    • Central samples of auto-correlation of magnitude of each subband N*K*(M^2+1)/2 = 4*4*(7^2+1)/2 = 400

    • Cross-correlation of orientations in same scale: N*K*(K-1)/2 = 4*4*(4-1)/2 = 24

    • Cross-correlation of magnitudes across scale: K^2*(N-1) = 4^2*(4-1) = 48

  4. Cross-scale phase statistics: Total 96 statistics

    • Cross-correlation of real coeffs with both coeffs at broader scale: 2*K^2*(N-1) = 2*4^2*(4-1) = 96

[52]:
# Sum marginal statistics
marginal_stats_num = (torch.sum(~torch.isnan(stats_dict['kurtosis_reconstructed'])) +
                      torch.sum(~torch.isnan(stats_dict['skew_reconstructed'])) +
                      torch.sum(~torch.isnan(stats_dict['var_highpass_residual'])) +
                      torch.sum(~torch.isnan(stats_dict['pixel_statistics'])))
print(f'Marginal statistics: {marginal_stats_num} parameters, compared to 17 in paper')

# Sum raw coefficient correlations
real_coefficient_corr_num = torch.sum(~torch.isnan(stats_dict['auto_correlation_reconstructed']))
real_variances = torch.sum(~torch.isnan(stats_dict['std_reconstructed']))
print(f'Raw coefficient correlation: {real_coefficient_corr_num + real_variances} parameters, '
      'compared to 125 in paper')

# Sum coefficient magnitude statistics
coeff_magnitude_stats_num = (torch.sum(~torch.isnan(stats_dict['auto_correlation_magnitude'])) +
                             torch.sum(~torch.isnan(stats_dict['cross_scale_correlation_magnitude'])) +
                             torch.sum(~torch.isnan(stats_dict['cross_orientation_correlation_magnitude'])))
coeff_magnitude_variances = torch.sum(~torch.isnan(stats_dict['magnitude_std']))

print(f'Coefficient magnitude statistics: {coeff_magnitude_stats_num + coeff_magnitude_variances} '
      'parameters, compared to 472 in paper')

# Sum cross-scale phase statistics
phase_statistics_num = torch.sum(~torch.isnan(stats_dict['cross_scale_correlation_real']))
print(f'Phase statistics: {phase_statistics_num} parameters, compared to 96 in paper')
Marginal statistics: 17 parameters, compared to 17 in paper
Raw coefficient correlation: 125 parameters, compared to 125 in paper
Coefficient magnitude statistics: 472 parameters, compared to 472 in paper
Phase statistics: 96 parameters, compared to 96 in paper

7.2 Magnitude means

The mean of each magnitude band are slightly different from the redundant statistics discussed in the previous section. Each of those statistics are exactly redundant, e.g., the center value of an autocorrelation matrix will always be 1. They thus cannot include any additional information. However, the magnitude means are only approximately redundant and thus could improve the texture representation. The authors excluded these values because they did not seem to be necessary: the magnitude means are constrained by the other statistics (though not perfectly), and thus including them does not improve the visual quality of the synthesized textures.

To demonstrate this, we will create a modified version of the PortillaSimoncelli class which includes the magnitude means to demonstrate:

  1. Even without explicitly including them in the texture representation, they are still approximately matched between the original and synthesized texture images.

  2. Including them in the representation does not significantly change the quality of the synthesized texture.

First, let’s create the modified model:

[53]:
from collections import OrderedDict

class PortillaSimoncelliMagMeans(po.simul.PortillaSimoncelli):
    r"""Include the magnitude means in the PS texture representation.

        Parameters
        ----------
        im_shape: int
            the size of the images being processed by the model

    """
    def __init__(
        self,
        im_shape,
    ):
        super().__init__(im_shape, n_scales=4, n_orientations=4, spatial_corr_width=7)


    def forward(self, image, scales=None):
        r"""Average Texture Statistics representations of two image

        Parameters
        ----------
        image : torch.Tensor
            A 4d tensor (batch, channel, height, width) containing the image(s) to
            analyze.
        scales : list, optional
            Which scales to include in the returned representation. If an empty
            list (the default), we include all scales. Otherwise, can contain
            subset of values present in this model's ``scales`` attribute.

        Returns
        -------
        representation_tensor: torch.Tensor
            3d tensor of shape (batch, channel, stats) containing the measured
            texture statistics.

        """
        stats = super().forward(image, scales=scales)
        # this helper function returns a list of tensors containing the steerable
        # pyramid coefficients at each scale
        pyr_coeffs = self._compute_pyr_coeffs(image)[1]
        # only compute the magnitudes for the desired scales
        magnitude_pyr_coeffs = [coeff.abs() for i, coeff in enumerate(pyr_coeffs)
                                if scales is None or i in scales]
        magnitude_means = [mag.mean((-2, -1)) for mag in magnitude_pyr_coeffs]
        return einops.pack([stats, *magnitude_means], 'b c *')[0]

    # overwriting these following two methods allows us to use the plot_representation method
    # with the modified model, making examining it easier.
    def convert_to_dict(self, representation_tensor: torch.Tensor) -> OrderedDict:
        """Convert tensor of stats to dictionary."""
        n_mag_means = self.n_scales * self.n_orientations
        rep = super().convert_to_dict(representation_tensor[..., :-n_mag_means])
        mag_means = representation_tensor[..., -n_mag_means:]
        rep['magnitude_means'] = einops.rearrange(mag_means, 'b c (s o) -> b c s o', s=self.n_scales, o=self.n_orientations)
        return rep

    def _representation_for_plotting(self, rep: OrderedDict) -> OrderedDict:
        r"""Convert the data into a dictionary representation that is more convenient for plotting.

        Intended as a helper function for plot_representation.
        """
        mag_means = rep.pop('magnitude_means')
        data = super()._representation_for_plotting(rep)
        data['magnitude_means'] = mag_means.flatten()
        return data

Now, let’s initialize our models and images for synthesis:

[55]:
img = po.tools.load_images(DATA_PATH / 'fig4a.jpg').to(DEVICE)
model = po.simul.PortillaSimoncelli(img.shape[-2:], spatial_corr_width=7).to(DEVICE)
model_mag_means = PortillaSimoncelliMagMeans(img.shape[-2:]).to(DEVICE)
im_init = (torch.rand_like(img)-.5) * .1 + img.mean()

And run the synthesis with the regular model, which does not include the mean of the steerable pyramid magnitudes, and then the augmented model, which does.

[56]:
# Set the RNG seed to make the two synthesis procedures as similar as possible.
po.tools.set_seed(100)
met = po.synth.MetamerCTF(img, model, loss_function=po.tools.optim.l2_norm, initial_image=im_init)
met.synthesize(store_progress=10, max_iter=short_synth_max_iter, change_scale_criterion=None, ctf_iters_to_check=7)

po.tools.set_seed(100)
met_mag_means = po.synth.MetamerCTF(img, model_mag_means, loss_function=po.tools.optim.l2_norm, initial_image=im_init)
met_mag_means.synthesize(store_progress=10, max_iter=short_synth_max_iter, change_scale_criterion=None, ctf_iters_to_check=7)
 93%|█████████▎| 927/1000 [00:40<00:03, 22.91it/s, loss=7.3206e-02, learning_rate=0.01, gradient_norm=8.8649e-01, pixel_change_norm=1.4852e-01, current_scale=all, current_scale_loss=7.3206e-02]
 93%|█████████▎| 934/1000 [00:45<00:03, 20.61it/s, loss=7.5847e-02, learning_rate=0.01, gradient_norm=8.6494e-01, pixel_change_norm=1.5210e-01, current_scale=all, current_scale_loss=7.5847e-02]

Now let’s examine the outputs. In the following plot, we display the synthesized metamer and the representation error for the metamer synthesized with and without explicitly constraining the magnitude means.

  • The two synthesized metamers appear almost identical, so including the magnitude means does not substantially change the resulting metamer at all, let alone improve its visual quality.

  • The representation errors are (as we’d expect) also very similar. Let’s focus on the plot in the bottom right, labeled “magnitude_means”. Each stem shows the mean of one of the magnitude bands, with the scales increasing from left to right. Looking at the representation error for the first image, we can see that, even without explicitly including the means, the error in this statistic is on the same magnitude as the other statistics, showing that it is being implicitly constrained. By comparing to the error for the second image, we can see that the error in the magnitude means does decrease, most notably in the coarsest scales.

[57]:
fig, axes = plt.subplots(2, 2, figsize=(21, 11), gridspec_kw={'width_ratios': [1, 3.1]})
for ax, im, info in zip(axes[:, 0], [met.metamer, met_mag_means.metamer], ['with', 'without']):
    po.imshow(im, ax=ax, title=f"Metamer {info} magnitude means")
    ax.xaxis.set_visible(False)
    ax.yaxis.set_visible(False)
model_mag_means.plot_representation(model_mag_means(met.metamer)-model_mag_means(img), ylim=(-.06, .06), ax=axes[0,1]);
model_mag_means.plot_representation(model_mag_means(met_mag_means.metamer)-model_mag_means(img), ylim=(-.06, .06), ax=axes[1,1]);
_images/tutorials_models_Metamer-Portilla-Simoncelli_96_0.png

Thus, we can feel fairly confident in excluding these magnitude means from the model. Note this follows the same logic as earlier in the notebook, when we tried removing different statistics to see their effect; here, we tried adding a statistic to determine its effect. Feel free to try using other target images or adding other statistics!

Under development – this currently contains examples of the earlier MAD synthesis, but we have yet to reproduce it using plenoptic.

Reproducing Wang and Simoncelli, 2008 (MAD Competition)

Goal here is to reproduce original MAD Competition results, as generated using the matlab code originally provided by Zhou Wang and then modified by the authors. MAD Competition is a synthesis method for efficiently computing two models, by generating sets of images that minimize/maximize one model’s loss while holding the other’s constant. For more details, see the 07_MAD_Competition and 08_Simple_MAD notebooks.

[1]:
import imageio
import torch
import scipy.io as sio
import pyrtools as pt
from scipy.io import loadmat
import numpy as np
import matplotlib.pyplot as plt
import plenoptic as po
import os.path as op
%matplotlib inline

%load_ext autoreload
%autoreload 2

SSIM

Before we discuss MAD Competition, let’s look a little at SSIM, since that’s the metric used in the original paper, and which we’ll be using here. Important to remember that SSIM is a similarity metric, so higher is better, and thus a value of 1 means the images are identical and it’s bounded between 0 and 1.

We have tests to show that this matches the output of the MATLAB code, won’t show here.

[2]:
img1 = po.data.einstein()
img2 = po.data.curie()
noisy = po.tools.add_noise(img1, [2,4,8])

We can see that increasing the noise level decreases the SSIM value, but not linearly

[3]:
po.metric.ssim(img1, noisy)
/home/billbrod/Documents/plenoptic/src/plenoptic/metric/perceptual_distance.py:42: UserWarning: Image range falls outside [0, 1]. img1: tensor([0.0039, 1.0000]), img2: tensor([-12.3002,  11.9818]). Continuing anyway...
  warnings.warn("Image range falls outside [0, 1]."
/home/billbrod/micromamba/envs/plenoptic/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[3]:
tensor([[0.0026],
        [0.0016],
        [0.0004]])

And that our noise level does match the MSE

[4]:
po.metric.mse(img1, noisy)
[4]:
tensor([[2.0000],
        [4.0000],
        [8.0000]])

MAD Competition

The following figure shows the results of MAD Competition synthesis using the original MATLAB code. It shows the original image in the top left. We then added some Gaussian noise (with a specified standard error) to get the image right below it. The four images to the right of that are the MAD-synthesized images. The first two have the same mean-squared error (MSE) as the first image (and each other), but the best and worst SSIM value (SSIM is a similarity metric, so higher is better), while the second two have the same SSIM as the first image, but the best and worst MSE. By comparing these images, we can get a sense for what MSE and SSIM consider important for image quality.

[5]:
# We need to download some additional data for this portion of the notebook. In order to do so,
# we use an optional dependency, pooch. If the following raises an ImportError or ModuleNotFoundError
# then install pooch in your plenoptic environment and restart your kernel.
fig, results = po.tools.external.plot_MAD_results('samp6', [128], vrange='row1', zoom=3)
_images/tutorials_applications_09_Original_MAD_9_0.png

There’s lots of info here, on the outputs of the MATLAB synthesis. We will later add stuff to investigate this using plenoptic.

[6]:
results
[6]:
{'L128': {'FIX_MSE': 127.99999999999999,
  'FIX_SSIM': 0.8183184633106257,
  'mse_fixmse_maxssim': array([128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128.]),
  'ssim_fixmse_maxssim': array([0.82669306, 0.83641599, 0.84768936, 0.86021352, 0.87332037,
         0.8861153 , 0.89794336, 0.90864194, 0.91828312, 0.9270046 ,
         0.93495769, 0.94226293, 0.94896492, 0.95506452, 0.9605342 ,
         0.96533558, 0.96945409, 0.97290935, 0.97575609, 0.97807185,
         0.97994388, 0.98145704, 0.98268627, 0.98369369, 0.98452855,
         0.98522884, 0.98582351, 0.98633451, 0.98677852, 0.98716831,
         0.98751374, 0.98782249, 0.98810062, 0.98835294, 0.98858334,
         0.98879493, 0.98899028, 0.98917148, 0.98934029, 0.98949816,
         0.9896463 , 0.98978576, 0.98991742, 0.99004203, 0.99016023,
         0.99027259, 0.99037961, 0.9904817 , 0.99057924, 0.99067257,
         0.99076198, 0.99084774, 0.99093007, 0.9910092 , 0.99108531,
         0.99115858, 0.99122917, 0.99129722, 0.99136287, 0.99142623,
         0.99148744, 0.99154658, 0.99160377, 0.99165909, 0.99171263,
         0.99176448, 0.9918147 , 0.99186338, 0.99191058, 0.99195637,
         0.99200081, 0.99204396, 0.99208587, 0.99212661, 0.99216622,
         0.99220476, 0.99224226, 0.99227877, 0.99231435, 0.99234902,
         0.99238284, 0.99241583, 0.99244804, 0.99247949, 0.99251023,
         0.99254029, 0.99256969, 0.99259847, 0.99262665, 0.99265426,
         0.99268134, 0.99270789, 0.99273395, 0.99275955, 0.99278469,
         0.99280941, 0.99283372, 0.99285765, 0.9928812 , 0.99290441]),
  'mse_fixmse_minssim': array([128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
         128.]),
  'ssim_fixmse_minssim': array([0.81069721, 0.80415136, 0.79827382, 0.7927989 , 0.78758476,
         0.7825861 , 0.77782948, 0.77338629, 0.76933555, 0.7657286 ,
         0.76258054, 0.75987991, 0.75759382, 0.75567185, 0.75405628,
         0.7526936 , 0.75154034, 0.75056265, 0.74973342, 0.74902986,
         0.74843227, 0.74792359, 0.7474892 , 0.74711674, 0.74679588,
         0.74651809, 0.74627635, 0.74606493, 0.74587912, 0.74571508,
         0.74556965, 0.74544023, 0.74532467, 0.74522116, 0.74512817,
         0.74504441, 0.74496878, 0.74490032, 0.74483822, 0.74478177,
         0.74473035, 0.74468342, 0.74464051, 0.74460123, 0.7445652 ,
         0.74453212, 0.74450172, 0.74447375, 0.74444799, 0.74442426,
         0.74440239, 0.74438221, 0.7443636 , 0.74434643, 0.74433057,
         0.74431594, 0.74430244, 0.74428997, 0.74427846, 0.74426784,
         0.74425803, 0.74424898, 0.74424062, 0.74423291, 0.74422579,
         0.74421923, 0.74421317, 0.74420758, 0.74420242, 0.74419767,
         0.74419328, 0.74418924, 0.74418551, 0.74418208, 0.74417891,
         0.74417599, 0.74417331, 0.74417083, 0.74416855, 0.74416646,
         0.74416453, 0.74416275, 0.74416112, 0.74415963, 0.74415825,
         0.74415699, 0.74415583, 0.74415477, 0.7441538 , 0.74415291,
         0.7441521 , 0.74415135, 0.74415068, 0.74415006, 0.7441495 ,
         0.74414899, 0.74414853, 0.74414811, 0.74414773, 0.74414739]),
  'maxssim': 0.99290440833461,
  'minssim': 0.7441473913547447,
  'mse_fixssim_minmse': array([127.62569966, 127.25907758, 126.89830629, 126.5432421 ,
         126.19375253, 125.84970824, 125.51098323, 125.17745477,
         124.84900331, 124.5255124 , 124.2068686 , 123.89296145,
         123.58368333, 123.27892946, 122.97859775, 122.68258882,
         122.39080584, 122.10315454, 121.81954311, 121.53988213,
         121.26408454, 120.99206555, 120.72374258, 120.45903523,
         120.19786522, 119.94015629, 119.68583422, 119.43482671,
         119.18706338, 118.94247568, 118.70099686, 118.46256195,
         118.22710767, 117.99457239, 117.76489611, 117.53802042,
         117.31388843, 117.09244474, 116.87363542, 116.65740794,
         116.44371116, 116.23249528, 116.02371179, 115.81731348,
         115.61325436, 115.41148964, 115.21197571, 115.01467011,
         114.81953145, 114.62651948, 114.43559495, 114.24671966,
         114.0598564 , 113.87496892, 113.69202191, 113.510981  ,
         113.33181268, 113.15448433, 112.97896417, 112.80522123,
         112.63322535, 112.46294714, 112.29435797, 112.12742994,
         111.96213588, 111.79844929, 111.63634436, 111.47579594,
         111.31677951, 111.15927119, 111.00324769, 110.8486863 ,
         110.6955649 , 110.54386192, 110.39355633, 110.24462762,
         110.09705579, 109.95082136, 109.8059053 , 109.66228908,
         109.5199546 , 109.37888422, 109.23906072, 109.10046731,
         108.96308761, 108.82690563, 108.69190575, 108.55807275,
         108.42539176, 108.29384826, 108.16342808, 108.03411739,
         107.90590266, 107.77877069, 107.6527086 , 107.52770378,
         107.40374392, 107.280817  , 107.15891125, 107.03801519]),
  'ssim_fixssim_minmse': array([0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
         0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846]),
  'mse_fixssim_maxmse': array([ 131.81989257,  136.34348117,  141.63201819,  147.93891278,
          155.38338115,  164.2072263 ,  174.58969259,  180.77038681,
          187.35687873,  194.48181227,  198.57370304,  202.66885863,
          206.95343129,  211.44231618,  216.13691245,  221.02634466,
          226.11581241,  231.45406809,  237.02262995,  242.83028251,
          248.81360619,  255.03345526,  261.46183743,  268.20589445,
          271.92855589,  275.50581231,  279.16591822,  282.90724991,
          286.73714745,  290.64830841,  294.64536986,  298.72977854,
          302.9070491 ,  307.17661949,  311.52367919,  315.9615864 ,
          320.49015071,  325.12141654,  329.86378322,  334.71438487,
          339.66399361,  344.73644098,  349.93210914,  355.25219363,
          360.69572122,  366.26837608,  371.96846258,  377.80329199,
          383.77938626,  389.88638347,  396.14353339,  402.54835547,
          409.12344679,  415.86473831,  422.7769448 ,  429.86538679,
          437.11955142,  444.55628708,  452.18579893,  460.02165701,
          468.0649593 ,  476.29650757,  484.7347087 ,  493.40456576,
          502.30552879,  511.44021594,  520.83114704,  530.48547572,
          540.40541267,  550.59925317,  561.07367344,  571.86215993,
          582.96345493,  594.39243813,  606.16232589,  618.28830266,
          630.7534013 ,  643.57218989,  656.77008653,  670.38608448,
          684.42229927,  698.90648368,  713.75502045,  729.09525608,
          744.89191175,  761.03686543,  777.65684378,  794.83179371,
          812.49479186,  830.66898008,  849.46972268,  868.84766347,
          888.75751215,  909.25712128,  930.36668338,  952.1656941 ,
          974.65873922,  997.87604088, 1021.77859313, 1046.42121422]),
  'ssim_fixssim_maxmse': array([0.81838315, 0.81837394, 0.81840433, 0.81841667, 0.8184811 ,
         0.81854265, 0.81865555, 0.8185667 , 0.81860168, 0.8186602 ,
         0.81849321, 0.81849264, 0.81849454, 0.81849667, 0.81850134,
         0.81850698, 0.81852163, 0.8185241 , 0.81853312, 0.81854646,
         0.81856169, 0.81858209, 0.81861676, 0.8186252 , 0.8184758 ,
         0.81848028, 0.81848157, 0.81848229, 0.8184804 , 0.81848067,
         0.81848019, 0.81847912, 0.81847739, 0.81847648, 0.81848131,
         0.81848611, 0.81848742, 0.81848633, 0.81848348, 0.81848151,
         0.81848324, 0.81848096, 0.81847737, 0.81847369, 0.81847083,
         0.81846829, 0.81846601, 0.81846364, 0.81846065, 0.81845897,
         0.81845727, 0.8184584 , 0.81845481, 0.81845096, 0.81844714,
         0.81844336, 0.81844286, 0.81844313, 0.8184418 , 0.81843787,
         0.81843385, 0.81843244, 0.81843006, 0.81842643, 0.81842301,
         0.81842012, 0.8184164 , 0.81841261, 0.81841113, 0.81841123,
         0.81841167, 0.81840845, 0.81840571, 0.81840289, 0.81840006,
         0.81839648, 0.81839794, 0.81840175, 0.81840656, 0.81840632,
         0.8184053 , 0.8184015 , 0.81841222, 0.8184118 , 0.81841594,
         0.81844041, 0.81845594, 0.81846008, 0.81847317, 0.81848535,
         0.81848382, 0.81848619, 0.81849768, 0.81851117, 0.81852665,
         0.81853106, 0.81853637, 0.81854015, 0.81854653, 0.81855083]),
  'minmse': 107.03801518643529,
  'maxmse': 1046.4212142210524,
  'noise_level': 128,
  'original_image': 'samp6'}}

Run notebook online with Binder:Binder

Reproducing Berardino et al., 2017 (Eigendistortions)

Author: Lyndon Duong, Jan 2021

In this demo, we will be reproducing eigendistortions first presented in Berardino et al 2017. We’ll be using a Front End model of the human visual system (called “On-Off” in the paper), as well as an early layer of VGG16. The Front End model is a simple convolutional neural network with a normalization nonlinearity, loosely based on biological retinal/geniculate circuitry.

Front-end model

This signal-flow diagram shows an input being decomposed into two channels, with each being luminance and contrast normalized, and ending with a ReLu.

What do eigendistortions tell us?

Our perception is influenced by our internal representation (neural responses) of the external world. Eigendistortions are rank-ordered directions in image space, along which a model’s responses are more sensitive. Plenoptic’s Eigendistortion object provides an easy way to synthesize eigendistortions for any PyTorch model.

[1]:
from plenoptic.synthesize import Eigendistortion
from plenoptic.simulate.models import OnOff
# this notebook uses torchvision, which is an optional dependency.
# if this fails, install torchvision in your plenoptic environment
# and restart the notebook kernel.
try:
    from torchvision.models import vgg16
except ModuleNotFoundError:
    raise ModuleNotFoundError("optional dependency torchvision not found!"
                              " please install it in your plenoptic environment "
                              "and restart the notebook kernel")
import torch
from torch import nn
import plenoptic as po

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("device: ", device)
device:  cuda
[2]:
max_iter_frontend = 2000
max_iter_vgg = 5000

Input preprocessing

Let’s load the parrot image used in the paper, display it, and cast it as a float32 tensor.

[3]:
image = po.data.parrot(as_gray=True)
zoom = 1

def crop(img):
    """Returns 2D numpy as image as 4D tensor Shape((b, c, h, w))"""
    img_tensor = img.clone()
    return img_tensor[...,:254,:254]  # crop to same size

image_tensor = crop(image).to(device)
print("Torch image shape:", image_tensor.shape)

# reduce size of image if we're on CPU, otherwise this will take too long
if device.type == 'cpu':
    image_tensor = image_tensor[...,100:164,100:164]
    # want to zoom so this is displayed at same size
    zoom = 256 / 64

po.imshow(image_tensor, zoom=zoom);

/mnt/home/wbroderick/plenoptic/plenoptic/tools/data.py:126: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525553989/work/torch/csrc/utils/tensor_new.cpp:230.)
  images = torch.tensor(images, dtype=torch.float32)
Torch image shape: torch.Size([1, 1, 254, 254])
_images/tutorials_applications_Demo_Eigendistortion_4_2.png

Since the Front-end OnOff model only has two channel outputs, we can easily visualize the feature maps. We’ll apply a circular mask to this model’s inputs to avoid edge artifacts in the synthesis method.

[4]:
mdl_f = OnOff(kernel_size=(31, 31), pretrained=True, apply_mask=True)
po.tools.remove_grad(mdl_f)
mdl_f = mdl_f.to(device)

response_f = mdl_f(image_tensor)
po.imshow(response_f, title=['on channel response', 'off channel response'], zoom=zoom);
/mnt/home/wbroderick/plenoptic/plenoptic/simulate/models/frontend.py:388: UserWarning: pretrained is True but cache_filt is False. Set cache_filt to True for efficiency unless you are fine-tuning.
  warn("pretrained is True but cache_filt is False. Set cache_filt to "
/mnt/home/wbroderick/miniconda3/envs/plenoptic/lib/python3.7/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525553989/work/aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
_images/tutorials_applications_Demo_Eigendistortion_6_1.png

Synthesizing eigendistortions

Front-end model: eigendistortion synthesis

Now that we have our Front End model set up, we can synthesize eigendistortions! This is done easily just by calling .synthesis() after instantiating the Eigendistortion object. We’ll synthesize the top and bottom k, representing the most- and least-noticeable eigendistortions for this model.

The paper synthesizes the top and bottom k=1 eigendistortions, but we’ll set k>1 so the algorithm converges/stabilizes faster. We highly recommended running the following block on GPU, otherwise we suggest cropping the image to a smaller size.

[5]:
# synthesize the top and bottom k distortions
eigendist_f = Eigendistortion(image=image_tensor, model=mdl_f)
eigendist_f.synthesize(k=3, method='power', max_iter=max_iter_frontend)

Initializing Eigendistortion -- Input dim: 64516 | Output dim: 129032
/mnt/home/wbroderick/plenoptic/plenoptic/tools/validate.py:179: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
  "model is in training mode, you probably want to call eval()"
Top k=3 eigendists computed | Tolerance 1.00E-07 reached.
Bottom k=3 eigendists computed | Tolerance 1.00E-07 reached.
Front-end model: eigendistortion display

Once synthesized, we can plot the distortion on the image using Eigendistortion’s built-in display method. Feel free to adjust the constants alpha_max and alpha_min that scale the amount of each distortion on the image.

[6]:
po.imshow(eigendist_f.eigendistortions[[0,-1]].mean(1, keepdim=True), vrange='auto1',
          title=["most-noticeable distortion", "least-noticeable"], zoom=zoom)

alpha_max, alpha_min = 3., 4.
f_max = po.synth.eigendistortion.display_eigendistortion(eigendist_f, eigenindex=0, alpha=alpha_max,
                                                         title=f'img + {alpha_max} * max_dist', zoom=zoom)
f_min = po.synth.eigendistortion.display_eigendistortion(eigendist_f, eigenindex=-1, alpha=alpha_min,
                                                         title=f'img + {alpha_min} * min_dist', zoom=zoom)
_images/tutorials_applications_Demo_Eigendistortion_10_0.png
_images/tutorials_applications_Demo_Eigendistortion_10_1.png
_images/tutorials_applications_Demo_Eigendistortion_10_2.png
VGG16: eigendistortion synthesis

Following the lead of Berardino et al. (2017), let’s compare the Front End model’s eigendistortion to those of an early layer of VGG16! VGG16 takes as input color images, so we’ll need to repeat the grayscale parrot along the RGB color dimension.

[7]:
# Create a class that takes the nth layer output of a given model
class NthLayerVGG16(nn.Module):
    """Wrapper to get the response of an intermediate layer of VGG16"""
    def __init__(self, layer: int = None, device=torch.device('cpu')):
        """
        Parameters
        ----------
        layer: int
            Which model response layer to output
        """
        super().__init__()
        model = vgg16(pretrained=True, progress=True).to(device)
        features = list(model.features)
        self.features = nn.ModuleList(features).eval()

        if layer is None:
            layer = len(self.features)
        self.layer = layer

    def forward(self, x):
        for ii, mdl in enumerate(self.features):
            x = mdl(x)
            if ii == self.layer:
                return x

VGG16 was trained on pre-processed ImageNet images with approximately zero mean and unit stdev, so we can preprocess our Parrot image the same way.

[8]:
# VGG16
def normalize(img_tensor):
    """standardize the image for vgg16"""
    return (img_tensor-img_tensor.mean())/ img_tensor.std()
image_tensor = normalize(crop(image)).to(device)

# reduce size of image if we're on CPU, otherwise this will take too long
if device.type == 'cpu':
    image_tensor = image_tensor[...,100:164,100:164]
    # want to zoom so this is displayed at same size
    zoom = 256 / 64

image_tensor3 = torch.cat([image_tensor]*3, dim=1).to(device)

# "layer 3" according to Berardino et al (2017)
mdl_v = NthLayerVGG16(layer=11, device=device)
po.tools.remove_grad(mdl_v)

eigendist_v = Eigendistortion(image=image_tensor3, model=mdl_v)
eigendist_v.synthesize(k=2, method='power', max_iter=max_iter_vgg)
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /mnt/home/wbroderick/.cache/torch/hub/checkpoints/vgg16-397923af.pth

Initializing Eigendistortion -- Input dim: 193548 | Output dim: 1016064
VGG16: eigendistortion display

We can now display the most- and least-noticeable eigendistortions as before, then compare their quality to those of the Front-end model.

Since the distortions here were synthesized using a pre-processed (normalized) imagea, we can easily pass a function to unprocess the image. Since the previous eigendistortions were grayscale, we’ll just take the mean across RGB channels for VGG16-synthesized eigendistortions and display them as grayscale too.

[9]:
po.imshow(eigendist_v.eigendistortions[[0,-1]].mean(1, keepdim=True), vrange='auto1',
          title=["most-noticeable distortion", "least-noticeable"], zoom=zoom)

# create an image processing function to unnormalize the image and avg the channels to grayscale
unnormalize = lambda x: (x*image.std() + image.mean()).mean(1, keepdims=True)
alpha_max, alpha_min = 15., 100.

v_max = po.synth.eigendistortion.display_eigendistortion(eigendist_v, eigenindex=0, alpha=alpha_max,
                                                         process_image=unnormalize,
                                                         title=f'img + {alpha_max} * most_noticeable_dist',
                                                         zoom=zoom)
v_min = po.synth.eigendistortion.display_eigendistortion(eigendist_v, eigenindex=-1, alpha=alpha_min,
                                                         process_image=unnormalize,
                                                         title=f'img + {alpha_min} * least_noticeable_dist',
                                                         zoom=zoom)
_images/tutorials_applications_Demo_Eigendistortion_16_0.png
_images/tutorials_applications_Demo_Eigendistortion_16_1.png
_images/tutorials_applications_Demo_Eigendistortion_16_2.png

Final thoughts

To rigorously test which of these model’s representations are more human-like, we’ll have to conduct a perceptual experiment. For now, we’ll just leave it to you to eyeball and decide which distortions are more or less noticeable!

Synthesis object design

The following describes how synthesis objects are structured. This is probably most useful if you are creating a new synthesis method that you would like to include in or be compliant with plenoptic, rather than using existing ones.

The synthesis methods included in plenoptic generate one or more novel images based on the output of a model. These images can be used to better understand the model or as stimuli for an experiment comparing the model against another system. Beyond this rather vague description, however, there is a good deal of variability. We use inheritance in order to try and keep synthesis methods as similar as possible, to facilitate user interaction with them (and testing), but we want to avoid forcing too much similarity.

In the following description:

** must connotes a requirement; any synthesis object not meeting this property

will not be merged and is not considered “plenoptic-compliant”.

  • should connotes a suggestion; a compelling reason is required if the property is not met.

  • may connotes an option; these properties may make things easier (for developers or users), but are completely optional.

All synthesis methods

To that end, all synthesis methods must inherit the plenoptic.synthesize.synthesis.Synthesis class. This requires the synthesis method to have a synthesize() method, and provides helper functions for save(), load(), and to(), which must be used when implementing them. Furthermore:

  • the initialization method (__init__()) must accept any images as its first input(s). If only a single image is accepted, it must be named image. If more than one, their names must be of the form image_X, replacing X with a more descriptive string. These must all have type torch.Tensor and they must be validated with plenoptic.tools.validate.validate_input(). This should be stored in an attribute with the same name as the argument.

  • the initialization method’s next argument(s) must be any models or metrics that the synthesis will be based on. Similarly, if a single model / metric is accepted, they must be named model / metric. If more than one, their names should be of the form X_model / X_metric, replacing X with a more descriptive string. These must be validated with plenoptic.tools.validate.validate_model() / plenoptic.tools.validate.validate_metric(). This should be stored in an attribute with the same name as the argument.

  • any other arguments to the initialization method may follow.

  • the object must be able to work on GPU and CPU. Users must be able to use the GPU either by initializing the synthesis object with tensors or models already on the GPU or by calling .to(). The easiest way to do this is to use torch.rand_like() and analogous methods, and explicitly calling .to() on any other newly-created tensors.

  • ideally, the same for different float and complex data types (e.g., support both torch.float32 and torch.float64), though this is not a strict requirement if there’s a good reason.

  • if synthesize() operates in an iterative fashion, it must accept a max_iter: int argument to specify how long to run synthesis for and a stop_criterion: float argument to allow for early termination if some convergence is reached. What exactly is being checked for convergence (e.g., change in loss, change in pixel values) may vary, but it must be clarified in the docstring. A stop_iters_to_check: int argument may also be included, which specifies how many iterations ago to check. If it is not included, the number of iterations must be clarified in docstring.

  • additionally, if synthesis is iterative, tqdm.auto.tqdm must be used as a progress bar, initialized with pbar = tqdm(range(max_iter)), which should present information using pbar.set_postfix() (such the loss or whatever else is checked for convergence, as discussed above).

  • synthesize() must not return anything. The outputs of synthesis must be stored as attributes of the object. The number of large attributes should be minimized in order to reduce overall size in memory.

  • the synthesis output must be stored as an attribute with the same name as the class (e.g., Metamer.metamer).

  • any attribute or method that the user does not need should be hidden (i.e., start with _).

  • consider using the @property decorator to make important attributes write-only or differentiate between the public and private views. For example, the optimized attribute of the plenoptic.synthesize.geodesic.Geodesic class is named _geodesic, but the geodesic attribute returns this tensor concatenated with two (unchanging) endpoints, as this is what the user will most often want to interact with.

The above are the only requirements that all synthesis methods must meet.

Helper / display functions

It may also be useful to include some functions for investigating the status or output(s) of synthesis. As a general rule, if a function will be called during synthesis (e.g., to compute a loss value), it should be a method of the object. If it is only called afterwards (e.g., to display the synthesis outputs in a useful way), it should be included as a function in the same file (see plenoptic.synthesize.metamer.display_metamer() for an example).

Functions that show images or videos should be called display_X, whereas those that show numbers as a scatter plot, line plot, etc. should be called plot_X. These must be axes-level matplotlib functions: they must accept an axis as an optional argument named ax, which will contain the plot. If no ax is supplied, matplotlib.pyplot.gca() must be used to create / grab the axis. If a multi-axis figure is called for (e.g., to display the synthesis output and plot the loss), a function named plot_synthesis_status() should be created. This must have an optional fig argument, creating a figure if none is supplied. See plenoptic.synthesize.metamer.plot_synthesis_status() for an example. If possible, this plot should be able to be animated to show progress over time. See plenoptic.synthesize.metamer.plot_synthesis_status() for an example.

See our Display and animate functions notebook for description and examples of the included plotting and display code.

Optimized synthesis

Many synthesis methods will use an optimizer to generate their outputs. If the method makes use of a torch.optim.Optimizer object, it must inherit plenoptic.synthesize.synthesis.OptimizedSynthesis class (this is a subclass of:class:plenoptic.synthesize.synthesis.Synthesis, so the above all still applies).

Currently, the following are required (if not all of these are applicable to new methods, we may modify OptimizedSynthesis):

  • the points about iterative synthesis described above all hold: synthesize() must accept max_iter, stop_criterion, may accept stop_iters_to_check, and must use tqdm.auto.tqdm.

  • the object must have an objective_function() method, which returns a measure of “how bad” the current synthesis output is. Optimization is minimizing this value.

  • the object must have a _check_convergence() method, which is used (along with stop_criterion and, optionally, stop_iters_to_check) to determine if synthesis has converged.

  • the object must have an _initialize() method, which initializes the synthesis output (e.g., with an appropriately-shaped sample of noise) and is called during the object’s initilization.

  • the initialization method may accept some argument to affect this initialization, which should be named initial_X (replacing X as appropriate). For example, this could be another image to use for initialization (initial_image) or some property of noise used to generate an initial image (initial_noise).

  • the initialization method must accept range_penalty_lambda: float and allowed_range: Tuple[float, float] arguments, which should be used with plenoptic.tools.optim.penalize_range() to constrain the range of synthesis output.

  • the synthesize() method must accept an optional optimizer: torch.optim.Optimizer argument, which defaults to None. OptimizedSynthesis._initialize_optimizer() is a helper function that should be called to set this up: it creates a default optimizer if the user does not specify one and double-checks that the optimizer parameter is the correct object if the user did.

  • during synthesis, the object should update the _losses, _gradient_norm, and _pixel_change_norm attributes on each iteration.

  • the object may have a _closure() method, which performs the gradient calculation. This (when passed to optimizer.step() during the synthesis loop in synthesize()) enables optimization algorithms that perform several evaluations of the gradient before taking a step (e.g., second-order methods). See OptimizedSynthesis._closure() for the simplest version of this.

  • the synthesize() method should accept a store_progress argument, which optionally stores additional information over iteration, such as the synthesis output-in-progress. OptimizedSynthesis has a setter method for this attribute, which will enable things are correct. This argument can be an integer (in which case, the attributes are updated every store_progress iterations), True (same behavior as 1), or False (no updating of attributes). This should probably be done in a method named _store().

  • the synthesize() method should be callable multiple times with the same object, in which case progress is resumed. On all subsequent calls, optimizer must be None (this is checked by OptimizedSynthesis._initialize_optimizer()) and store_progress, stop_criterion, and stop_iters_to_check must have the same values.

How to order methods

Python doesn’t care how you order any of the methods or properties of a class, but doing so in a consistent manner will make reading the code easier, so try to follow these guidelines:

  • The caller should (almost always) be above the callee and related concepts should be close together.

  • __init__() should be first, followed by any methods called within it. This will probably include _initialize(), for those classes that have it.

  • After all those initialization-related methods, synthesize() should come next. Again, this should be followed by most of the the methods called within it, ordered roughly by importance. Thus, the first methods should probably be objective_function() and _optimizer_step(), followed by _check_convergence(). What shouldn’t be included in this section are helper methods that aren’t scientifically interesting (e.g., _initialize_optimizer(), _store()).

  • Next, any other content-related methods, such as helper methods that perform useful computations that are not called by __init__() or synthesize() (e.g., plenoptic.synthesize.geodesic.Geodesic.calculate_jerkiness).

  • Next, the helper functions we ignored from earlier, such as _initialize_optimizer() and _store().

  • Next, save(), load(), to().

  • Finally, all the properties.

Tips and Tricks

Why does synthesis take so long?

Synthesis can take a while to run, especially if you are trying to synthesize a large image or using a complicated model. The following might help:

  • Reducing the amount of time your model’s forward pass takes is the easiest way to reduce the overall duration of synthesis, as the forward pass is called many, many times over synthesis. Try using python’s built-in profiling tools to check which part of your model’s forward pass is taking the longest, and try to make those parts more efficient. Jupyter also has nice profiling tools. For example, if you have for loops in your code, try and replace them with matrix operations and einsum.

  • If you have access to a GPU, use it! If your inputs are on the GPU before initializing the synthesis methods, the synthesis methods will also make use of the GPU. You can also move the plenoptic’s synthesis methods and models over to the GPU after initialization using the .to() method.

Optimization is hard

You should double-check whether synthesis has successfully completed before interpreting the outputs or using them in any experiments. This is not necessary for eigendistortions (see its notebook for more details on why), but is necessary for all the iterative optimization methods.

  • For metamers, this means double-checking that the difference between the model representation of the metamer and the target image is small enough. If your model’s representation is multi-scale, trying coarse-to-fine optimization may help (see notebook for details).

  • For MAD competition, this means double-checking that the reference metric is constant and that the optimized metric has converged at a lower or higher value (depending on the value of synthesis_target); use plenoptic.synthesize.mad_competition.plot_synthesis_status() to visualize these values. You will likely need to spend time trying out different values for the metric_tradeoff_lambda argument set during initialization to achieve this.

  • For geodesics, check that your geodesic’s path energy is small enough and that the deviation from a straight line in representational space is minimal (use plenoptic.synthesize.geodesic.plot_deviation_from_line())

For all of the above, if synthesis has not found a good solution, you may need to run synthesis longer, use a learning-rate scheduler, change the learning rate, or try different optimizers. Each method’s objective_function method captures the value that we are trying to minimize, but may contain other values (such as the penalty on allowed range values).

Additionally, it may be helpful to visualize the progression of synthesis, using each synthesis method’s animate or plot_synthesis_status helper functions (e.g., plenoptic.synthesize.metamer.plot_synthesis_status()).

Tweaking the model

You can also improve your changes of finding a good synthesis by tweaking the model. For example, the loss function used for metamer synthesis by default is mean-squared error. This implicitly weights all aspects of the model’s representation equally. Thus, if there are portions of the representation whose magnitudes are significantly smaller than the others, they might not be matched at the same rate as the others. You can address this using coarse-to-fine synthesis or picking a more suitable loss function, but it’s generally a good idea for all of a model’s representation to have roughly the same magnitude. You can do this in a principled or empirical manner:

  • Principled: compose your representation of statistics that you know lie within the same range. For example, use correlations instead of covariances (see the Portilla-Simoncelli model, and in particular how plenoptic’s implementation differs from matlab for an example of this).

  • Empirical: measure your model’s representation on a dataset of relevant natural images and then use this output to z-score your model’s representation on each pass (see [Ziemba2021] for an example; this is what the Van Hateren database is used for).

  • In the middle: normalize statistics based on their value in the original image (note: not the image the model is taking as input! this will likely make optimization very difficult).

If you are computing a multi-channel representation, you may have a similar problem where one channel is larger or smaller than the others. Here, tweaking the loss function might be more useful. Using something like logsumexp (the log of the sum of exponentials, a smooth approximation of the maximum function) to combine across channels after using something like L2-norm to compute the loss within each channel might help.

None of the existing synthesis methods meet my needs

plenoptic provides four synthesis methods, but you may find you wish to do something slightly outside the capabilities of the existing methods. There are generally two ways to do this: by tweaking your model or by extending one of the methods.

If you extend a method successfully or would like help making it work, please let us know by posting a discussion!

Reproducibility

plenoptic includes several results reproduced from the literature and aims to facilitate reproducible research. However, we are limited by our dependencies and PyTorch, in particular, comes with the caveat that “Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds” (quote from the v1.12 documentation).

This means that you should note the plenoptic version and the pytorch version your synthesis used in order to guarantee reproduciblity (some versions of pytorch will give consistent results with each other, but it’s not guaranteed and hard to predict). We do not believe reproducibility depends on the python version or any other packages. In general, the CPU and GPU will always give different results.

We reproduce several results from the literature and validate these as part of our tests. We are therefore aware of the following changes that broke reproducibility:

  • PyTorch 1.8 and 1.9 give the same results, but 1.10 changes results in changes, probably due to the difference in how the sub-gradient for torch.min and torch.max are computed (see this PR).

  • PyTorch 1.12 breaks reproducibility with 1.10 and 1.11, unclear why (see this issue).

plenoptic

plenoptic package

Subpackages
plenoptic.data package
Submodules
plenoptic.data.data_utils module
plenoptic.data.data_utils.get(*item_names, as_gray=None)[source]

Load an image based on the item name from the package’s data resources.

Parameters:
  • item_names (str) – The names of the items to load, without specifying the file extension.

  • as_gray (Optional[bool]) – Whether to load in the image(s) as grayscale or not. If None, will make best guess based on file extension.

Return type:

The loaded image object. The exact return type depends on the load_images function implementation.

Notes

This function first retrieves the full filename using get_filename and then loads the image using load_images from the tools.data module. It supports loading images as grayscale if they have a .pgm extension.

plenoptic.data.data_utils.get_path(item_name)[source]

Retrieve the filename that matches the given item name with any extension.

Parameters:

item_name (str) – The name of the item to find the file for, without specifying the file extension.

Return type:

Traversable

Returns:

The filename matching the item_name with its extension.

Raises:

AssertionError – If no files or more than one file match the item_name.

Notes

This function uses glob to search for files in the current directory matching the item_name. It is assumed that there is only one file matching the name regardless of its extension.

plenoptic.data.fetch module

Fetch data using pooch.

This is inspired by scipy’s datasets module.

plenoptic.data.fetch.fetch_data(dataset_name)[source]

Download data, using pooch. These are largely used for testing.

To view list of downloadable files, look at DOWNLOADABLE_FILES.

This checks whether the data already exists and is unchanged and downloads again, if necessary. If dataset_name ends in .tar.gz, this also decompresses and extracts the archive, returning the Path to the resulting directory. Else, it just returns the Path to the downloaded file.

Return type:

Path

plenoptic.data.fetch.find_shared_directory(paths)[source]

Find directory shared by all paths.

Return type:

Path

Module contents
plenoptic.data.color_wheel(as_gray=False)[source]
Return type:

Tensor

plenoptic.data.curie()[source]
Return type:

Tensor

plenoptic.data.einstein()[source]
Return type:

Tensor

plenoptic.data.fetch_data(dataset_name)[source]

Download data, using pooch. These are largely used for testing.

To view list of downloadable files, look at DOWNLOADABLE_FILES.

This checks whether the data already exists and is unchanged and downloads again, if necessary. If dataset_name ends in .tar.gz, this also decompresses and extracts the archive, returning the Path to the resulting directory. Else, it just returns the Path to the downloaded file.

Return type:

Path

plenoptic.data.parrot(as_gray=False)[source]
Return type:

Tensor

plenoptic.data.reptile_skin()[source]
Return type:

Tensor

plenoptic.metric package
Submodules
plenoptic.metric.classes module
class plenoptic.metric.classes.NLP[source]

Bases: Module

simple class for implementing normalized laplacian pyramid

This class just calls plenoptic.metric.normalized_laplacian_pyramid on the image and returns a 3d tensor with the flattened activations.

NOTE: synthesis using this class will not be the exact same as synthesis using the plenoptic.metric.nlpd function (by default), because the synthesis methods use torch.norm(x - y, p=2) as the distance metric between representations, whereas nlpd uses the root-mean square of the distance (i.e., torch.sqrt(torch.mean(x-y)**2))

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(image)

returns flattened NLP activations

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the load_state_dict() method.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

forward(image)[source]

returns flattened NLP activations

WARNING: For now this only supports images with batch and channel size 1

Parameters:

image (torch.Tensor) – image to pass to normalized_laplacian_pyramid

Returns:

representation – 3d tensor with flattened NLP activations

Return type:

torch.Tensor

plenoptic.metric.model_metric module
plenoptic.metric.model_metric.model_metric(x, y, model)[source]

Calculate distance between x and y in model space root mean squared error

Parameters:
  • image (torch.Tensor) – image, (B x C x H x W)

  • model (torch class) – torch model with defined forward and backward operations

Notes

plenoptic.metric.naive module
plenoptic.metric.naive.mse(img1, img2)[source]

return the MSE between img1 and img2

Our baseline metric to compare two images is often mean-squared error, MSE. This is not a good approximation of the human visual system, but is handy to compare against.

For two images, \(x\) and \(y\), with \(n\) pixels each:

\[MSE &= \frac{1}{n}\sum_i=1^n (x_i - y_i)^2\]

The two images must have a float dtype

Parameters:
  • img1 (torch.Tensor) – The first image to compare

  • img2 (torch.Tensor) – The second image to compare, must be same size as img1

Returns:

mse – the mean-squared error between img1 and img2

Return type:

torch.float

plenoptic.metric.perceptual_distance module
plenoptic.metric.perceptual_distance.ms_ssim(img1, img2, power_factors=None)[source]

Multiscale structural similarity index (MS-SSIM)

As described in [1], multiscale structural similarity index (MS-SSIM) is an improvement upon structural similarity index (SSIM) that takes into account the perceptual distance between two images on different scales.

SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images, producing three maps instead of scalars. The SSIM map is the elementwise product of these three maps. See metric.ssim and metric.ssim_map for a full description of SSIM.

To get images of different scales, average pooling operations with kernel size 2 are performed recursively on the input images. The product of contrast map and structure map (the “contrast-structure map”) is computed for all but the coarsest scales, and the overall SSIM map is only computed for the coarsest scale. Their mean values are raised to exponents and multiplied to produce MS-SSIM:

\[MSSSIM = {SSIM}_M^{a_M} \prod_{i=1}^{M-1} ({CS}_i)^{a_i}\]

Here :math: M is the number of scales, :math: {CS}_i is the mean value of the contrast-structure map for the i’th finest scale, and :math: {SSIM}_M is the mean value of the SSIM map for the coarsest scale. If at least one of these terms are negative, the value of MS-SSIM is zero. The values of :math: a_i, i=1,…,M are taken from the argument power_factors.

Parameters:
  • img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.

  • img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).

  • power_factors (1D array, optional.) – power exponents for the mean values of maps, for different scales (from fine to coarse). The length of this array determines the number of scales. By default, this is set to [0.0448, 0.2856, 0.3001, 0.2363, 0.1333], which is what psychophysical experiments in [1] found.

Returns:

msssim – 2d tensor of shape (batch, channel) containing the MS-SSIM for each image

Return type:

torch.Tensor

References

[1] (1,2)

Wang, Zhou, Eero P. Simoncelli, and Alan C. Bovik. “Multiscale structural similarity for image quality assessment.” The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003. Vol. 2. IEEE, 2003.

plenoptic.metric.perceptual_distance.nlpd(img1, img2)[source]

Normalized Laplacian Pyramid Distance

As described in [1], this is an image quality metric based on the transformations associated with the early visual system: local luminance subtraction and local contrast gain control

A laplacian pyramid subtracts a local estimate of the mean luminance at six scales. Then a local gain control divides these centered coefficients by a weighted sum of absolute values in spatial neighborhood.

These weights parameters were optimized for redundancy reduction over an training database of (undistorted) natural images.

Note that we compute root mean squared error for each scale, and then average over these, effectively giving larger weight to the lower frequency coefficients (which are fewer in number, due to subsampling).

Parameters:
  • img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.

  • img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).

Returns:

distance – The normalized Laplacian Pyramid distance.

Return type:

torch.Tensor of shape (batch, channel)

References

[1]

Laparra, V., Ballé, J., Berardino, A. and Simoncelli, E.P., 2016. Perceptual image quality assessment using a normalized Laplacian pyramid. Electronic Imaging, 2016(16), pp.1-6.

plenoptic.metric.perceptual_distance.normalized_laplacian_pyramid(img)[source]

Compute the normalized Laplacian Pyramid using pre-optimized parameters

Parameters:

img (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. This representation is designed for grayscale images and will be computed separately for each channel (so channels are treated in the same way as batches).

Returns:

normalized_laplacian_activations – The normalized Laplacian Pyramid with six scales

Return type:

list of torch.Tensor

plenoptic.metric.perceptual_distance.ssim(img1, img2, weighted=False, pad=False)[source]

Structural similarity index

As described in [1], the structural similarity index (SSIM) is a perceptual distance metric, giving the distance between two images. SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images. See the references for more information.

This implementation follows the original implementation, as found at [2], as well as providing the option to use the weighted version used in [4] (which was shown to consistently improve the image quality prediction on the LIVE database).

Note that this is a similarity metric (not a distance), and so 1 means the two images are identical and 0 means they’re very different. When the two images are negatively correlated, SSIM can be negative. SSIM is bounded between -1 and 1.

This function returns the mean SSIM, a scalar-valued metric giving the average over the whole image. For the SSIM map (showing the computed value across the image), call ssim_map.

Parameters:
  • img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.

  • img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).

  • weighted (bool, optional) – whether to use the original, unweighted SSIM version (False) as used in [1] or the weighted version (True) as used in [4]. See Notes section for the weight

  • pad ({False, 'constant', 'reflect', 'replicate', 'circular'}, optional) – If not False, how to pad the image for the convolutions computing the local average of each image. See torch.nn.functional.pad for how these work.

Returns:

mssim – 2d tensor of shape (batch, channel) containing the mean SSIM for each image, averaged over the whole image

Return type:

torch.Tensor

Notes

The weight used when weighted=True is:

\[\log((1+\frac{\sigma_1^2}{C_2})(1+\frac{\sigma_2^2}{C_2}))\]

where \(sigma_1^2\) and \(sigma_2^2\) are the variances of img1 and img2, respectively, and \(C_2\) is a constant. See [4] for more details.

References

[1] (1,2)

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error measurement to structural similarity” IEEE Transactions on Image Processing, vol. 13, no. 1, Jan. 2004.

[4] (1,2,3)

Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual discriminability. Journal of Vision, 8(12), 1–13. http://dx.doi.org/10.1167/8.12.8

plenoptic.metric.perceptual_distance.ssim_map(img1, img2)[source]

Structural similarity index map

As described in [1], the structural similarity index (SSIM) is a perceptual distance metric, giving the distance between two images. SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images. See the references for more information.

This implementation follows the original implementation, as found at [2], as well as providing the option to use the weighted version used in [4] (which was shown to consistently improve the image quality prediction on the LIVE database).

Note that this is a similarity metric (not a distance), and so 1 means the two images are identical and 0 means they’re very different. When the two images are negatively correlated, SSIM can be negative. SSIM is bounded between -1 and 1.

This function returns the SSIM map, showing the SSIM values across the image. For the mean SSIM (a single value metric), call ssim.

Parameters:
  • img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.

  • img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).

  • weighted (bool, optional) – whether to use the original, unweighted SSIM version (False) as used in [1] or the weighted version (True) as used in [4]. See Notes section for the weight

Returns:

ssim_map – 4d tensor containing the map of SSIM values.

Return type:

torch.Tensor

References

[1] (1,2)

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error measurement to structural similarity” IEEE Transactions on Image Processing, vol. 13, no. 1, Jan. 2004.

[4] (1,2)

Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual discriminability. Journal of Vision, 8(12), 1–13. http://dx.doi.org/10.1167/8.12.8

Module contents
plenoptic.simulate package
Subpackages
plenoptic.simulate.canonical_computations package
Submodules
plenoptic.simulate.canonical_computations.filters module
plenoptic.simulate.canonical_computations.filters.circular_gaussian2d(kernel_size, std, out_channels=1)[source]

Creates normalized, centered circular 2D gaussian tensor with which to convolve.

Parameters:
  • kernel_size (Union[int, Tuple[int, int]]) – Filter kernel size. Recommended to be odd so that kernel is properly centered.

  • std (Union[float, Tensor]) – Standard deviation of 2D circular Gaussian.

  • out_channels (int) – Number of channels with same kernel repeated along channel dim.

Returns:

Circular gaussian kernel, normalized by total pixel-sum (_not_ by 2pi*std). filt has Size([out_channels=n_channels, in_channels=1, height, width]).

Return type:

filt

plenoptic.simulate.canonical_computations.filters.gaussian1d(kernel_size=11, std=1.5)[source]

Normalized 1D Gaussian.

1d Gaussian of size kernel_size, centered half-way, with variable std deviation, and sum of 1.

With default values, this is the 1d Gaussian used to generate the windows for SSIM

Parameters:
  • kernel_size (int) – Size of Gaussian. Recommended to be odd so that kernel is properly centered.

  • std (Union[float, Tensor]) – Standard deviation of Gaussian.

Returns:

1d Gaussian with Size([kernel_size]).

Return type:

filt

plenoptic.simulate.canonical_computations.laplacian_pyramid module
class plenoptic.simulate.canonical_computations.laplacian_pyramid.LaplacianPyramid(n_scales=5, scale_filter=False)[source]

Bases: Module

Laplacian Pyramid in Torch.

The Laplacian pyramid [1] is a multiscale image representation. It decomposes the image by computing the local mean using Gaussian blurring filters and substracting it from the image and repeating this operation on the local mean itself after downsampling. This representation is overcomplete and invertible.

Parameters:
  • n_scales (int) – number of scales to compute

  • scale_filter (bool, optional) – If true, the norm of the downsampling/upsampling filter is 1. If false (default), it is 2. If the norm is 1, the image is multiplied by 4 during the upsampling operation; the net effect is that the n`th scale of the pyramid is divided by `2^n.

References

[1]

Burt, P. and Adelson, E., 1983. The Laplacian pyramid as a compact image code. IEEE Transactions on communications, 31(4), pp.532-540.

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(x)

Build the Laplacian pyramid of an image.

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

recon_pyr(y)

Reconstruct the image from its Laplacian pyramid.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the load_state_dict() method.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

forward(x)[source]

Build the Laplacian pyramid of an image.

Parameters:

x (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. If there are multiple channels, the Laplacian is computed separately for each of them

Returns:

y – Laplacian pyramid representation, each element of the list corresponds to a scale, from fine to coarse

Return type:

list of torch.Tensor

recon_pyr(y)[source]

Reconstruct the image from its Laplacian pyramid.

Parameters:

y (list of torch.Tensor) – Laplacian pyramid representation, each element of the list corresponds to a scale, from fine to coarse

Returns:

x – Image, or batch of images

Return type:

torch.Tensor of shape (batch, channel, height, width)

plenoptic.simulate.canonical_computations.non_linearities module
plenoptic.simulate.canonical_computations.non_linearities.local_gain_control(x, epsilon=1e-08)[source]

Spatially local gain control.

Parameters:
  • x (torch.Tensor) – Tensor of shape (batch, channel, height, width)

  • epsilon (float, optional) – Small constant to avoid division by zero.

Returns:

  • norm (torch.Tensor) – The local energy of x. Note that it is down sampled by a factor 2 in (unlike rect2pol).

  • direction (torch.Tensor) – The local phase of x (aka. local unit vector, or local state)

Notes

This function is an analogue to rectangular_to_polar for real valued signals.

Norm and direction (analogous to complex modulus and phase) are defined using blurring operator and division. Indeed blurring the responses removes high frequencies introduced by the squaring operation. In the complex case adding the quadrature pair response has the same effect (note that this is most clearly seen in the frequency domain). Here computing the direction (phase) reduces to dividing out the norm (modulus), indeed the signal only has one real component. This is a normalization operation (local unit vector), hence the connection to local gain control.

plenoptic.simulate.canonical_computations.non_linearities.local_gain_control_dict(coeff_dict, residuals=True)[source]

Spatially local gain control, for each element in a dictionary.

Parameters:
  • coeff_dict (dict) – A dictionary containing tensors of shape (batch, channel, height, width)

  • residuals (bool, optional) – An option to carry around residuals in the energy dict. Note that the transformation is not applied to the residuals, that is dictionary elements with a key starting in “residual”.

Returns:

  • energy (dict) – The dictionary of torch.Tensors containing the local energy of x.

  • state (dict) – The dictionary of torch.Tensors containing the local phase of x.

Notes

Note that energy and state is not computed on the residuals.

The inverse operation is achieved by local_gain_release_dict. This function is an analogue to rectangular_to_polar_dict for real valued signals. For more details, see local_gain_control()

plenoptic.simulate.canonical_computations.non_linearities.local_gain_release(norm, direction, epsilon=1e-08)[source]

Spatially local gain release.

Parameters:
  • norm (torch.Tensor) – The local energy of x. Note that it is down sampled by a factor 2 in (unlike rect2pol).

  • direction (torch.Tensor) – The local phase of x (aka. local unit vector, or local state)

  • epsilon (float, optional) – Small constant to avoid division by zero.

Returns:

x – Tensor of shape (batch, channel, height, width)

Return type:

torch.Tensor

Notes

This function is an analogue to polar_to_rectangular for real valued signals.

Norm and direction (analogous to complex modulus and phase) are defined using blurring operator and division. Indeed blurring the responses removes high frequencies introduced by the squaring operation. In the complex case adding the quadrature pair response has the same effect (note that this is most clearly seen in the frequency domain). Here computing the direction (phase) reduces to dividing out the norm (modulus), indeed the signal only has one real component. This is a normalization operation (local unit vector), hence the connection to local gain control.

plenoptic.simulate.canonical_computations.non_linearities.local_gain_release_dict(energy, state, residuals=True)[source]

Spatially local gain release, for each element in a dictionary.

Parameters:
  • energy (dict) – The dictionary of torch.Tensors containing the local energy of x.

  • state (dict) – The dictionary of torch.Tensors containing the local phase of x.

  • residuals (bool, optional) – An option to carry around residuals in the energy dict. Note that the transformation is not applied to the residuals, that is dictionary elements with a key starting in “residual”.

Returns:

coeff_dict – A dictionary containing tensors of shape (batch, channel, height, width)

Return type:

dict

Notes

The inverse operation to local_gain_control_dict. This function is an analogue to polar_to_rectangular_dict for real valued signals. For more details, see local_gain_release()

plenoptic.simulate.canonical_computations.non_linearities.polar_to_rectangular_dict(energy, state, residuals=True)[source]

Return the real and imaginary parts of tensor in a dictionary.

Parameters:
  • energy (dict) – The dictionary of torch.Tensors containing the local complex modulus.

  • state (dict) – The dictionary of torch.Tensors containing the local phase.

  • dim (int, optional) – The dimension that contains the real and imaginary components.

  • residuals (bool, optional) – An option to carry around residuals in the energy branch.

Returns:

coeff_dict – A dictionary containing complex tensors of coefficients.

Return type:

dict

plenoptic.simulate.canonical_computations.non_linearities.rectangular_to_polar_dict(coeff_dict, residuals=False)[source]

Return the complex modulus and the phase of each complex tensor in a dictionary.

Parameters:
  • coeff_dict (dict) – A dictionary containing complex tensors.

  • dim (int, optional) – The dimension that contains the real and imaginary components.

  • residuals (bool, optional) – An option to carry around residuals in the energy branch.

Returns:

  • energy (dict) – The dictionary of torch.Tensors containing the local complex modulus of coeff_dict.

  • state (dict) – The dictionary of torch.Tensors containing the local phase of coeff_dict.

plenoptic.simulate.canonical_computations.steerable_pyramid_freq module

Steerable frequency pyramid

Construct a steerable pyramid on matrix two dimensional signals, in the Fourier domain.

class plenoptic.simulate.canonical_computations.steerable_pyramid_freq.SteerablePyramidFreq(image_shape, height='auto', order=3, twidth=1, is_complex=False, downsample=True, tight_frame=False)[source]

Bases: Module

Steerable frequency pyramid in Torch

Construct a steerable pyramid on matrix two dimensional signals, in the Fourier domain. Boundary-handling is circular. Reconstruction is exact (within floating point errors). However, if the image has an odd-shape, the reconstruction will not be exact due to boundary-handling issues that have not been resolved.

The squared radial functions tile the Fourier plane with a raised-cosine falloff. Angular functions are cos(theta-k*pi/order+1)^(order).

Notes

Transform described in [1], filter kernel design described in [2]. For further information see the project webpage_

Parameters:
  • image_shape (list or tuple) – shape of input image

  • height (‘auto’ or int) – The height of the pyramid. If ‘auto’, will automatically determine based on the size of image.

  • order (int.) – The Gaussian derivative order used for the steerable filters, in [1, 15]. Note that to achieve steerability the minimum number of orientation is order + 1, and is used here. To get more orientations at the same order, use the method steer_coeffs

  • twidth (int) – The width of the transition region of the radial lowpass function, in octaves

  • is_complex (bool) – Whether the pyramid coefficients should be complex or not. If True, the real and imaginary parts correspond to a pair of even and odd symmetric filters. If False, the coefficients only include the real part / even

  • downsample (bool) – Whether to downsample each scale in the pyramid or keep the output pyramid coefficients in fixed bands of size imshapeximshape. When downsample is False, the forward method returns a tensor.

  • tight_frame (bool default: False) – Whether the pyramid obeys the generalized parseval theorem or not (i.e. is a tight frame). If True, the energy of the pyr_coeffs = energy of the image. If not this is not true. In order to match the matlabPyrTools or pyrtools pyramids, this must be set to False

image_shape

shape of input image

Type:

list or tuple

pyr_size

Dictionary containing the sizes of the pyramid coefficients. Keys are (level, band) tuples and values are tuples.

Type:

dict

fft_norm

The way the ffts are normalized, see pytorch documentation for more details.

Type:

str

is_complex

Whether the coefficients are complex- or real-valued.

Type:

bool

References

[1]

E P Simoncelli and W T Freeman, “The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation,” Second Int’l Conf on Image Processing, Washington, DC, Oct 1995.

[2]

A Karasaridis and E P Simoncelli, “A Filter Design Technique for Steerable Pyramid Image Transforms”, ICASSP, Atlanta, GA, May 1996. .. _webpage: https://www.cns.nyu.edu/~eero/steerpyr/

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

convert_pyr_to_tensor(pyr_coeffs[, ...])

Convert coefficient dictionary to a tensor.

convert_tensor_to_pyr(pyr_tensor, ...)

Convert pyramid coefficient tensor to dictionary format.

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(x[, scales])

Generate the steerable pyramid coefficients for an image

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

recon_pyr(pyr_coeffs[, levels, bands])

Reconstruct the image or batch of images, optionally using subset of pyramid coefficients.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the load_state_dict() method.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

steer_coeffs(pyr_coeffs, angles[, even_phase])

Steer pyramid coefficients to the specified angles

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

static convert_pyr_to_tensor(pyr_coeffs, split_complex=False)[source]

Convert coefficient dictionary to a tensor.

The output tensor has shape (batch, channel, height, width) and is intended to be used in an torch.nn.Module downstream. In the multichannel case, all bands for each channel will be stacked together (i.e. if there are 2 channels and 18 bands per channel, pyr_tensor[:,0:18,…] will contain the pyr responses for channel 1 and pyr_tensor[:, 18:36, …] will contain the responses for channel 2). In the case of a complex, multichannel pyramid with split_complex=True, the real/imaginary bands will be intereleaved so that they appear as pairs with neighboring indices in the channel dimension of the tensor (Note: the residual bands are always real so they will only ever have a single band even when split_complex=True.)

This only works if pyr_coeffs was created with a pyramid with downsample=False

Parameters:
  • pyr_coeffs (OrderedDict) – the pyramid coefficients

  • split_complex (bool) – indicates whether the output should split complex bands into real/imag channels or keep them as a single channel. This should be True if you intend to use a convolutional layer on top of the output.

Return type:

Tuple[Tensor, Tuple[int, bool, List[Union[Tuple[int, int], Literal['residual_lowpass', 'residual_highpass']]]]]

Returns:

  • pyr_tensor – shape (batch, channel, height, width). pyramid coefficients reshaped into tensor. The first channel will be the residual highpass and the last will be the residual lowpass. Each band is then a separate channel.

  • pyr_info – Information required to recreate the dictionary, containing the number of channels, if split_complex was used in this function call, and the list of pyramid keys for the dictionary

See also

convert_tensor_to_pyr

Convert tensor representation to pyramid dictionary.

static convert_tensor_to_pyr(pyr_tensor, num_channels, split_complex, pyr_keys)[source]

Convert pyramid coefficient tensor to dictionary format.

num_channels, split_complex, and pyr_keys are elements of the pyr_info tuple returned by convert_pyr_to_tensor. You should always unpack the arguments for this function from that pyr_info tuple. Example Usage:

pyr_tensor, pyr_info = convert_pyr_to_tensor(pyr_coeffs, split_complex=True)
pyr_dict = convert_tensor_to_pyr(pyr_tensor, *pyr_info)
Parameters:
  • pyr_tensor (Tensor) – Shape (batch, channel, height, width). The pyramid coefficients

  • num_channels (int) – number of channels in the original input tensor the pyramid was created for (i.e. if the input was an RGB image, this would be 3)

  • split_complex (bool) – true or false, specifying whether the pyr_tensor was created with complex channels split or not (if the pyramid was a complex pyramid).

  • pyr_keys (List[Union[Tuple[int, int], Literal['residual_lowpass', 'residual_highpass']]]) – tuple containing the list of keys for the original pyramid dictionary

Returns:

pyramid coefficients in dictionary format

Return type:

pyr_coeffs

See also

convert_pyr_to_tensor

Convert pyramid dictionary representation to tensor.

forward(x, scales=None)[source]

Generate the steerable pyramid coefficients for an image

Parameters:
  • x (Tensor) – A tensor containing the image to analyze. We want to operate on this in the pytorch-y way, so we want it to be 4d (batch, channel, height, width).

  • scales (Optional[List[Union[int, Literal['residual_lowpass', 'residual_highpass']]]]) – Which scales to include in the returned representation. If None, we include all scales. Otherwise, can contain subset of values present in this model’s scales attribute (ints from 0 up to self.num_scales-1 and the strs ‘residual_highpass’ and ‘residual_lowpass’. Can contain a single value or multiple values. If it’s an int, we include all orientations from that scale. Order within the list does not matter.

Returns:

Pyramid coefficients

Return type:

representation

recon_pyr(pyr_coeffs, levels='all', bands='all')[source]

Reconstruct the image or batch of images, optionally using subset of pyramid coefficients.

NOTE: in order to call this function, you need to have previously called self.forward(x), where x is the tensor you wish to reconstruct. This will fail if you called forward() with a subset of scales.

Parameters:
  • pyr_coeffs (OrderedDict) – pyramid coefficients to reconstruct from

  • levels (Union[Literal['all'], List[Union[int, Literal['residual_lowpass', 'residual_highpass']]]]) – If list should contain some subset of integers from 0 to self.num_scales-1 (inclusive), ‘residual_lowpass’, and ‘residual_highpass’. If ‘all’, returned value will contain all valid levels. Otherwise, must be one of the valid levels.

  • bands (Union[Literal['all'], List[int]]) – If list, should contain some subset of integers from 0 to self.num_orientations-1. If ‘all’, returned value will contain all valid orientations. Otherwise, must be one of the valid orientations.

Returns:

The reconstructed image, of shape (batch, channel, height, width)

Return type:

recon

steer_coeffs(pyr_coeffs, angles, even_phase=True)[source]

Steer pyramid coefficients to the specified angles

This allows you to have filters that have the Gaussian derivative order specified in construction, but arbitrary angles or number of orientations.

Parameters:
  • pyr_coeffs (OrderedDict) – the pyramid coefficients to steer

  • angles (List[float]) – list of angles (in radians) to steer the pyramid coefficients to

  • even_phase (bool) – specifies whether the harmonics are cosine or sine phase aligned about those positions.

Return type:

Tuple[dict, dict]

Returns:

  • resteered_coeffs – dictionary of re-steered pyramid coefficients. will have the same number of scales as the original pyramid (though it will not contain the residual highpass or lowpass). like pyr_coeffs, keys are 2-tuples of ints indexing the scale and orientation, but now we’re indexing angles instead of self.num_orientations.

  • resteering_weights – dictionary of weights used to re-steer the pyramid coefficients. will have the same keys as resteered_coeffs.

Module contents
plenoptic.simulate.models package
Submodules
plenoptic.simulate.models.frontend module

Model architectures in this file are found in [1], [2]. frontend.OnOff() has optional pretrained filters that were reverse-engineered from a previously-trained model and should be used at your own discretion.

References

[1]

A Berardino, J Ballé, V Laparra, EP Simoncelli, Eigen-distortions of hierarchical representations, NeurIPS 2017; https://arxiv.org/abs/1710.02266

class plenoptic.simulate.models.frontend.LinearNonlinear(kernel_size, on_center=True, width_ratio_limit=4.0, amplitude_ratio=1.25, pad_mode='reflect', activation=<built-in function softplus>)[source]

Bases: Module

Linear-Nonlinear model, applies a difference of Gaussians filter followed by an activation function. Model is described in [1] and [2].

Parameters:
  • kernel_size (Union[int, Tuple[int, int]]) – Shape of convolutional kernel.

  • on_center (bool) – Dictates whether center is on or off; surround will be the opposite of center (i.e. on-off or off-on).

  • width_ratio_limit (float) – Sets a lower bound on the ratio of surround_std over center_std. The surround Gaussian must be wider than the center Gaussian in order to be a proper Difference of Gaussians. surround_std will be clamped to ratio_limit times center_std.

  • amplitude_ratio (float) – Ratio of center/surround amplitude. Applied before filter normalization.

  • pad_mode (str) – Padding for convolution, defaults to “reflect”.

  • activation (Callable[[Tensor], Tensor]) – Activation function following linear convolution.

center_surround

CenterSurround difference of Gaussians filter.

Type:

nn.Module

References

[1]

A Berardino, J Ballé, V Laparra, EP Simoncelli, Eigen-distortions of hierarchical representations, NeurIPS 2017; https://arxiv.org/abs/1710.02266

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

display_filters([zoom])

Displays convolutional filters of model

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(x)

Define the computation performed at every call.

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the load_state_dict() method.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

display_filters(zoom=5.0, **kwargs)[source]

Displays convolutional filters of model

Parameters:
  • zoom (float) – Magnification factor for po.imshow()

  • **kwargs – Keyword args for po.imshow

Returns:

fig

Return type:

PyrFigure

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses. :rtype: Tensor

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class plenoptic.simulate.models.frontend.LuminanceContrastGainControl(kernel_size, on_center=True, width_ratio_limit=4.0, amplitude_ratio=1.25, pad_mode='reflect', activation=<built-in function softplus>)[source]

Bases: Module

Linear center-surround followed by luminance and contrast gain control, and activation function. Model is described in [1] and [2].

Parameters:
  • kernel_size (Union[int, Tuple[int, int]]) – Shape of convolutional kernel.

  • on_center (bool) – Dictates whether center is on or off; surround will be the opposite of center (i.e. on-off or off-on).

  • width_ratio_limit (float) – Sets a lower bound on the ratio of surround_std over center_std. The surround Gaussian must be wider than the center Gaussian in order to be a proper Difference of Gaussians. surround_std will be clamped to ratio_limit times center_std.

  • amplitude_ratio (float) – Ratio of center/surround amplitude. Applied before filter normalization.

  • pad_mode (str) – Padding for convolution, defaults to “reflect”.

  • activation (Callable[[Tensor], Tensor]) – Activation function following linear convolution.

center_surround

Difference of Gaussians linear filter.

Type:

nn.Module

luminance

Gaussian convolutional kernel used to normalize signal by local luminance.

Type:

nn.Module

contrast

Gaussian convolutional kernel used to normalize signal by local contrast.

Type:

nn.Module

luminance_scalar

Scale factor for luminance normalization.

Type:

nn.Parameter

contrast_scalar

Scale factor for contrast normalization.

Type:

nn.Parameter

References

[1]

A Berardino, J Ballé, V Laparra, EP Simoncelli, Eigen-distortions of hierarchical representations, NeurIPS 2017; https://arxiv.org/abs/1710.02266

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

display_filters([zoom])

Displays convolutional filters of model

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(x)

Define the computation performed at every call.

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the load_state_dict() method.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

display_filters(zoom=5.0, **kwargs)[source]

Displays convolutional filters of model

Parameters:
  • zoom (float) – Magnification factor for po.imshow()

  • **kwargs – Keyword args for po.imshow

Returns:

fig

Return type:

PyrFigure

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses. :rtype: Tensor

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class plenoptic.simulate.models.frontend.LuminanceGainControl(kernel_size, on_center=True, width_ratio_limit=4.0, amplitude_ratio=1.25, pad_mode='reflect', activation=<built-in function softplus>)[source]

Bases: Module

Linear center-surround followed by luminance gain control and activation. Model is described in [1] and [2].

Parameters:
  • kernel_size (Union[int, Tuple[int, int]]) – Shape of convolutional kernel.

  • on_center (bool) – Dictates whether center is on or off; surround will be the opposite of center (i.e. on-off or off-on).

  • width_ratio_limit (float) – Sets a lower bound on the ratio of surround_std over center_std. The surround Gaussian must be wider than the center Gaussian in order to be a proper Difference of Gaussians. surround_std will be clamped to ratio_limit times center_std.

  • amplitude_ratio (float) – Ratio of center/surround amplitude. Applied before filter normalization.

  • pad_mode (str) – Padding for convolution, defaults to “reflect”.

  • activation (Callable[[Tensor], Tensor]) – Activation function following linear convolution.

center_surround

Difference of Gaussians linear filter.

Type:

nn.Module

luminance

Gaussian convolutional kernel used to normalize signal by local luminance.

Type:

nn.Module

luminance_scalar

Scale factor for luminance normalization.

Type:

nn.Parameter

References

[1]

A Berardino, J Ballé, V Laparra, EP Simoncelli, Eigen-distortions of hierarchical representations, NeurIPS 2017; https://arxiv.org/abs/1710.02266

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

display_filters([zoom])

Displays convolutional filters of model

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(x)

Define the computation performed at every call.

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the load_state_dict() method.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

display_filters(zoom=5.0, **kwargs)[source]

Displays convolutional filters of model

Parameters:
  • zoom (float) – Magnification factor for po.imshow()

  • **kwargs – Keyword args for po.imshow

Returns:

fig

Return type:

PyrFigure

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses. :rtype: Tensor

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class plenoptic.simulate.models.frontend.OnOff(kernel_size, width_ratio_limit=4.0, amplitude_ratio=1.25, pad_mode='reflect', pretrained=False, activation=<built-in function softplus>, apply_mask=False, cache_filt=False)[source]

Bases: Module

Two-channel on-off and off-on center-surround model with local contrast and luminance gain control.

This model is called OnOff in Berardino et al 2017.

Parameters:
  • kernel_size (Union[int, Tuple[int, int]]) – Shape of convolutional kernel.

  • width_ratio_limit (float) – Sets a lower bound on the ratio of surround_std over center_std. The surround Gaussian must be wider than the center Gaussian in order to be a proper Difference of Gaussians. surround_std will be clamped to ratio_limit times center_std.

  • amplitude_ratio (float) – Ratio of center/surround amplitude. Applied before filter normalization.

  • pad_mode (str) – Padding for convolution, defaults to “reflect”.

  • pretrained – Whether or not to load model params estimated from [1]. See Notes for details.

  • activation (Callable[[Tensor], Tensor]) – Activation function following linear and gain control operations.

  • apply_mask (bool) – Whether or not to apply circular disk mask centered on the input image. This is useful for synthesis methods like Eigendistortions to ensure that the synthesized distortion will not appear in the periphery. See plenoptic.tools.signal.make_disk() for details on how mask is created.

  • cache_filt (bool) – Whether or not to cache the filter. Avoids regenerating filt with each forward pass. Cached to self._filt.

Notes

These 12 parameters (standard deviations & scalar constants) were reverse-engineered from model from [1], [2]. Please use these pretrained weights at your own discretion.

References

[1] (1,2)

A Berardino, J Ballé, V Laparra, EP Simoncelli, Eigen-distortions of hierarchical representations, NeurIPS 2017; https://arxiv.org/abs/1710.02266

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

display_filters([zoom])

Displays convolutional filters of model

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(x)

Define the computation performed at every call.

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the load_state_dict() method.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

display_filters(zoom=5.0, **kwargs)[source]

Displays convolutional filters of model

Parameters:
  • zoom (float) – Magnification factor for po.imshow()

  • **kwargs – Keyword args for po.imshow

Returns:

fig

Return type:

PyrFigure

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses. :rtype: Tensor

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

plenoptic.simulate.models.naive module
class plenoptic.simulate.models.naive.CenterSurround(kernel_size, on_center=True, width_ratio_limit=2.0, amplitude_ratio=1.25, center_std=1.0, surround_std=4.0, out_channels=1, pad_mode='reflect', cache_filt=False)[source]

Bases: Module

Center-Surround, Difference of Gaussians (DoG) filter model. Can be either on-center/off-surround, or vice versa.

Filter is constructed as: .. math:

f &= amplitude_ratio * center - surround \
f &= f/f.sum()

The signs of center and surround are determined by center argument. The standard deviation of the surround Gaussian is constrained to be at least width_ratio_limit times that of the center Gaussian.

Parameters:
  • kernel_size (Union[int, Tuple[int, int]]) – Shape of convolutional kernel.

  • on_center (Union[bool, List[bool]]) – Dictates whether center is on or off; surround will be the opposite of center (i.e. on-off or off-on). If List of bools, then list length must equal out_channels, if just a single bool, then all out_channels will be assumed to be all on-off or off-on.

  • width_ratio_limit (float) – Sets a lower bound on the ratio of surround_std over center_std. The surround Gaussian must be wider than the center Gaussian in order to be a proper Difference of Gaussians. surround_std will be clamped to ratio_limit times center_std.

  • amplitude_ratio (float) – Ratio of center/surround amplitude. Applied before filter normalization.

  • center_std (Union[float, Tensor]) – Standard deviation of circular Gaussian for center.

  • surround_std (Union[float, Tensor]) – Standard deviation of circular Gaussian for surround. Must be at least ratio_limit times center_std.

  • out_channels (int) – Number of filters.

  • pad_mode (str) – Padding for convolution, defaults to “circular”.

  • cache_filt (bool) – Whether or not to cache the filter. Avoids regenerating filt with each forward pass. Cached to self._filt

Attributes:
filt

Creates an on center/off surround, or off center/on surround conv filter

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(x)

Define the computation performed at every call.

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the load_state_dict() method.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

property filt: Tensor

Creates an on center/off surround, or off center/on surround conv filter

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses. :rtype: Tensor

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class plenoptic.simulate.models.naive.Gaussian(kernel_size, std=3.0, pad_mode='reflect', out_channels=1, cache_filt=False)[source]

Bases: Module

Isotropic Gaussian convolutional filter. Kernel elements are normalized and sum to one.

Parameters:
  • kernel_size (Union[int, Tuple[int, int]]) – Size of convolutional kernel.

  • std (Union[float, Tensor]) – Standard deviation of circularly symmtric Gaussian kernel.

  • pad_mode (str) – Padding mode argument to pass to torch.nn.functional.pad.

  • out_channels (int) – Number of filters with which to convolve.

  • cache_filt (bool) – Whether or not to cache the filter. Avoids regenerating filt with each forward pass. Cached to self._filt.

Attributes:
filt

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(x, **conv2d_kwargs)

Define the computation performed at every call.

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the load_state_dict() method.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

property filt
forward(x, **conv2d_kwargs)[source]

Define the computation performed at every call.

Should be overridden by all subclasses. :rtype: Tensor

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class plenoptic.simulate.models.naive.Identity(name=None)[source]

Bases: Module

simple class that just returns a copy of the image

We use this as a “dummy model” for metrics that we don’t have the representation for. We use this as the model and then just change the objective function.

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(img)

Return a copy of the image

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the load_state_dict() method.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

forward(img)[source]

Return a copy of the image

Parameters:

img (torch.Tensor) – The image to return

Returns:

img – a clone of the input image

Return type:

torch.Tensor

class plenoptic.simulate.models.naive.Linear(kernel_size=(3, 3), pad_mode='circular', default_filters=True)[source]

Bases: Module

Simplistic linear convolutional model: It splits the input greyscale image into low and high frequencies.

Parameters:
  • kernel_size (Union[int, Tuple[int, int]]) – Convolutional kernel size.

  • pad_mode (str) – Mode with which to pad image using nn.functional.pad().

  • default_filters (bool) – Initialize the filters to a low-pass and a band-pass.

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(x)

Define the computation performed at every call.

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the load_state_dict() method.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses. :rtype: Tensor

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

plenoptic.simulate.models.portilla_simoncelli module

Portilla-Simoncelli texture statistics.

The Portilla-Simoncelli (PS) texture statistics are a set of image statistics, first described in [1]_, that are proposed as a sufficient set of measurements for describing visual textures. That is, if two texture images have the same values for all PS texture stats, humans should consider them as members of the same family of textures.

class plenoptic.simulate.models.portilla_simoncelli.PortillaSimoncelli(image_shape, n_scales=4, n_orientations=4, spatial_corr_width=9)[source]

Bases: Module

Portila-Simoncelli texture statistics.

The Portilla-Simoncelli (PS) texture statistics are a set of image statistics, first described in [1], that are proposed as a sufficient set of measurements for describing visual textures. That is, if two texture images have the same values for all PS texture stats, humans should consider them as members of the same family of textures.

The PS stats are computed based on the steerable pyramid [2]. They consist of the local auto-correlations, cross-scale (within-orientation) correlations, and cross-orientation (within-scale) correlations of both the pyramid coefficients and the local energy (as computed by those coefficients). Additionally, they include the first four global moments (mean, variance, skew, and kurtosis) of the image and down-sampled versions of that image. See the paper and notebook for more description.

Parameters:
  • image_shape (Tuple[int, int]) – Shape of input image.

  • n_scales (int) – The number of pyramid scales used to measure the statistics (default=4)

  • n_orientations (int) – The number of orientations used to measure the statistics (default=4)

  • spatial_corr_width (int) – The width of the spatial cross- and auto-correlation statistics

scales

The names of the unique scales of coefficients in the pyramid, used for coarse-to-fine metamer synthesis.

Type:

list

References

[1]

J Portilla and E P Simoncelli. A Parametric Texture Model based on Joint Statistics of Complex Wavelet Coefficients. Int’l Journal of Computer Vision. 40(1):49-71, October, 2000. http://www.cns.nyu.edu/~eero/ABSTRACTS/portilla99-abstract.html http://www.cns.nyu.edu/~lcv/texture/

[2]

E P Simoncelli and W T Freeman, “The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation,” Second Int’l Conf on Image Processing, Washington, DC, Oct 1995.

Methods

add_module(name, module)

Add a child module to the current module.

apply(fn)

Apply fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Return an iterator over module buffers.

children()

Return an iterator over immediate children modules.

compile(*args, **kwargs)

Compile this Module's forward using torch.compile().

convert_to_dict(representation_tensor)

Convert tensor of statistics to a dictionary.

convert_to_tensor(representation_dict)

Convert dictionary of statistics to a tensor.

cpu()

Move all model parameters and buffers to the CPU.

cuda([device])

Move all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Set the module in evaluation mode.

extra_repr()

Set the extra representation of the module.

float()

Casts all floating point parameters and buffers to float datatype.

forward(image[, scales])

Generate Texture Statistics representation of an image.

get_buffer(target)

Return the buffer given by target if it exists, otherwise throw an error.

get_extra_state()

Return any extra state to include in the module's state_dict.

get_parameter(target)

Return the parameter given by target if it exists, otherwise throw an error.

get_submodule(target)

Return the submodule given by target if it exists, otherwise throw an error.

half()

Casts all floating point parameters and buffers to half datatype.

ipu([device])

Move all model parameters and buffers to the IPU.

load_state_dict(state_dict[, strict, assign])

Copy parameters and buffers from state_dict into this module and its descendants.

modules()

Return an iterator over all modules in the network.

named_buffers([prefix, recurse, ...])

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse, ...])

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Return an iterator over module parameters.

plot_representation(data[, ax, figsize, ...])

Plot the representation in a human viewable format -- stem plots with data separated out by statistic type.

register_backward_hook(hook)

Register a backward hook on the module.

register_buffer(name, tensor[, persistent])

Add a buffer to the module.

register_forward_hook(hook, *[, prepend, ...])

Register a forward hook on the module.

register_forward_pre_hook(hook, *[, ...])

Register a forward pre-hook on the module.

register_full_backward_hook(hook[, prepend])

Register a backward hook on the module.

register_full_backward_pre_hook(hook[, prepend])

Register a backward pre-hook on the module.

register_load_state_dict_post_hook(hook)

Register a post hook to be run after module's load_state_dict is called.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Add a parameter to the module.

register_state_dict_pre_hook(hook)

Register a pre-hook for the load_state_dict() method.

remove_scales(representation_tensor, ...)

Remove statistics not associated with scales.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

Set extra state contained in the loaded state_dict.

share_memory()

See torch.Tensor.share_memory_().

state_dict(*args[, destination, prefix, ...])

Return a dictionary containing references to the whole state of the module.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

to_empty(*, device[, recurse])

Move the parameters and buffers to the specified device without copying storage.

train([mode])

Set the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

update_plot(axes, data[, batch_idx])

Update the information in our representation plot.

xpu([device])

Move all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Reset gradients of all model parameters.

__call__

convert_to_dict(representation_tensor)[source]

Convert tensor of statistics to a dictionary.

While the tensor representation is required by plenoptic’s synthesis objects, the dictionary representation is easier to manually inspect.

This dictionary will contain NaNs in its values: these are placeholders for the redundant statistics.

Parameters:

representation_tensor (Tensor) – 3d tensor of statistics.

Returns:

Dictionary of representation, with informative keys.

Return type:

rep

See also

convert_to_tensor

Convert dictionary representation to tensor.

convert_to_tensor(representation_dict)[source]

Convert dictionary of statistics to a tensor.

Parameters:

representation_dict (OrderedDict) – Dictionary of representation.

Return type:

3d tensor of statistics.

See also

convert_to_dict

Convert tensor representation to dictionary.

forward(image, scales=None)[source]

Generate Texture Statistics representation of an image.

Note that separate batches and channels are analyzed in parallel.

Parameters:
  • image (Tensor) – A 4d tensor (batch, channel, height, width) containing the image(s) to analyze.

  • scales (Optional[List[Union[Literal['pixel_statistics'], int, Literal['residual_lowpass', 'residual_highpass']]]]) – Which scales to include in the returned representation. If None, we include all scales. Otherwise, can contain subset of values present in this model’s scales attribute, and the returned tensor will then contain the subset corresponding to those scales.

Returns:

3d tensor of shape (batch, channel, stats) containing the measured texture statistics.

Return type:

representation_tensor

Raises:

ValueError : – If image is not 4d or has a dtype other than float or complex.

plot_representation(data, ax=None, figsize=(15, 15), ylim=None, batch_idx=0, title=None)[source]

Plot the representation in a human viewable format – stem plots with data separated out by statistic type.

This plots the representation of a single batch and averages over all channels in the representation.

We create the following axes:

  • pixels+var_highpass: marginal pixel statistics (first four moments, min, max) and variance of the residual highpass.

  • std+skew+kurtosis recon: the standard deviation, skew, and kurtosis of the reconstructed lowpass image at each scale

  • magnitude_std: the standard deviation of the steerable pyramid coefficient magnitudes at each orientation and scale.

  • auto_correlation_reconstructed: the auto-correlation of the reconstructed lowpass image at each scale (summarized using Euclidean norm).

  • auto_correlation_magnitude: the auto-correlation of the pyramid coefficient magnitudes at each scale and orientation (summarized using Euclidean norm).

  • cross_orientation_correlation_magnitude: the cross-correlations between each orientation at each scale (summarized using Euclidean norm)

If self.n_scales > 1, we also have:

  • cross_scale_correlation_magnitude: the cross-correlations between the pyramid coefficient magnitude at one scale and the same orientation at the next-coarsest scale (summarized using Euclidean norm).

  • cross_scale_correlation_real: the cross-correlations between the real component of the pyramid coefficients and the real and imaginary components (at the same orientation) at the next-coarsest scale (summarized using Euclidean norm).

Parameters:
  • data (Tensor) – The data to show on the plot. Else, should look like the output of self.forward(img), with the exact same structure (e.g., as returned by metamer.representation_error() or another instance of this class).

  • ax (Optional[Axes]) – Axes where we will plot the data. If a plt.Axes instance, will subdivide into 6 or 8 new axes (depending on self.n_scales). If None, we create a new figure.

  • figsize (Tuple[float, float]) – The size of the figure. Ignored if ax is not None.

  • ylim (Union[Tuple[float, float], Literal[False], None]) – If not None, the y-limits to use for this plot. If None, we use the default, slightly adjusted so that the minimum is 0. If False, do not change y-limits.

  • batch_idx (int) – Which index to take from the batch dimension (the first one)

  • title (string) – Title for the plot

Return type:

Tuple[Figure, List[Axes]]

Returns:

  • fig – Figure containing the plot

  • axes – List of 6 or 8 axes containing the plot (depending on self.n_scales)

remove_scales(representation_tensor, scales_to_keep)[source]

Remove statistics not associated with scales.

For a given representation_tensor and a list of scales_to_keep, this attribute removes all statistics not associated with those scales.

Note that calling this method will always remove statistics.

Parameters:
  • representation_tensor (Tensor) – 3d tensor containing the measured representation statistics.

  • scales_to_keep (List[Union[Literal['pixel_statistics'], int, Literal['residual_lowpass', 'residual_highpass']]]) – Which scales to include in the returned representation. Can contain subset of values present in this model’s scales attribute, and the returned tensor will then contain the subset of the full representation corresponding to those scales.

Returns:

Representation tensor with some statistics removed.

Return type:

limited_representation_tensor

update_plot(axes, data, batch_idx=0)[source]

Update the information in our representation plot.

This is used for creating an animation of the representation over time. In order to create the animation, we need to know how to update the matplotlib Artists, and this provides a simple way of doing that. It relies on the fact that we’ve used plot_representation to create the plots we want to update and so know that they’re stem plots.

We take the axes containing the representation information (note that this is probably a subset of the total number of axes in the figure, if we’re showing other information, as done by Metamer.animate), grab the representation from plotting and, since these are both lists, iterate through them, updating them to the values in data as we go.

In order for this to be used by FuncAnimation, we need to return Artists, so we return a list of the relevant artists, the markerline and stemlines from the StemContainer.

Currently, this averages over all channels in the representation.

Parameters:
  • axes (List[Axes]) – A list of axes to update. We assume that these are the axes created by plot_representation and so contain stem plots in the correct order.

  • batch_idx (int) – Which index to take from the batch dimension (the first one)

  • data (Tensor) – The data to show on the plot. Else, should look like the output of self.forward(img), with the exact same structure (e.g., as returned by metamer.representation_error() or another instance of this class).

Returns:

A list of the artists used to update the information on the stem plots

Return type:

stem_artists

Module contents
Module contents
plenoptic.synthesize package
Submodules
plenoptic.synthesize.autodiff module
plenoptic.synthesize.autodiff.jacobian(y, x)[source]

Explicitly compute the full Jacobian matrix. N.B. This is only recommended for small input sizes (e.g. <100x100 image)

Parameters:
  • y (Tensor) – Model output with gradient attached

  • x (Tensor) – Tensor with gradient function model input with gradient attached

Returns:

Jacobian matrix with torch.Size([len(y), len(x)])

Return type:

J

plenoptic.synthesize.autodiff.jacobian_vector_product(y, x, V, dummy_vec=None)[source]

Compute Jacobian Vector Product: \(\text{jvp} = (\partial y/\partial x) v\)

Forward Mode Auto-Differentiation (Rop in Theano). PyTorch does not natively support this operation; this function essentially calls backward mode autodiff twice, as described in [1].

See vector_jacobian_product() docstring on why we and pass arguments for retain_graph and create_graph.

Parameters:
  • y (Tensor) – Model output with gradient attached, shape is torch.Size([m, 1])

  • x (Tensor) – Model input with gradient attached, shape is torch.Size([n, 1]), i.e. same dim as input tensor

  • V (Tensor) – Directions in which to compute product, shape is torch.Size([n, k]) where k is number of vectors to compute

  • dummy_vec (Tensor) – Vector with which to do jvp trick [1]. If argument exists, then use some pre-allocated, cached vector, otherwise create a new one and move to device in this method.

Returns:

Jacobian-vector product, torch.Size([n, k])

Return type:

Jv

Notes

[1] https://j-towns.github.io/2017/06/12/A-new-trick.html

plenoptic.synthesize.autodiff.vector_jacobian_product(y, x, U, retain_graph=True, create_graph=True, detach=False)[source]

Compute vector Jacobian product: \(\text{vjp} = u^T(\partial y/\partial x)\)

Backward Mode Auto-Differentiation (Lop in Theano)

Note on efficiency: When this function is used in the context of power iteration for computing eigenvectors, the vector output will be repeatedly fed back into vector_jacobian_product() and jacobian_vector_product(). To prevent the accumulation of gradient history in this vector (especially on GPU), we need to ensure the computation graph is not kept in memory after each iteration. We can do this by detaching the output, as well as carefully specifying where/when to retain the created graph.

Parameters:
  • y (Tensor) – Output with gradient attached, torch.Size([m, 1]).

  • x (Tensor) – Input with gradient attached, torch.Size([n, 1]).

  • U (Tensor) – Direction, shape is torch.Size([m, k]), i.e. same dim as output tensor.

  • retain_graph (bool) – Whether or not to keep graph after doing one vector_jacobian_product(). Must be set to True if k>1.

  • create_graph (bool) – Whether or not to create computational graph. Usually should be set to True unless you’re reusing the graph like in the second step of jacobian_vector_product().

  • detach (bool) – As with create_graph, only necessary to be True when reusing the output like we do in the 2nd step of jacobian_vector_product().

Returns:

vector-Jacobian product, torch.Size([m, k]).

Return type:

vJ

plenoptic.synthesize.eigendistortion module
class plenoptic.synthesize.eigendistortion.Eigendistortion(image, model)[source]

Bases: Synthesis

Synthesis object to compute eigendistortions induced by a model on a given input image.

Parameters:
  • image (Tensor) – Image, torch.Size(batch=1, channel, height, width). We currently do not support batches of images, as each image requires its own optimization.

  • model (Module) – Torch model with defined forward and backward operations.

batch_size
Type:

int

n_channels
Type:

int

im_height
Type:

int

im_width
Type:

int

jacobian

Is only set when synthesize() is run with method='exact'. Default to None.

Type:

Tensor

eigendistortions

Tensor of eigendistortions (eigenvectors of Fisher matrix), ordered by eigenvalue, with Size((n_distortions, n_channels, im_height, im_width)).

Type:

Tensor

eigenvalues

Tensor of eigenvalues corresponding to each eigendistortion, listed in decreasing order.

Type:

Tensor

eigenindex

Index of each eigenvector/eigenvalue.

Type:

listlike

Notes

This is a method for comparing image representations in terms of their ability to explain perceptual sensitivity in humans. It estimates eigenvectors of the FIM. A model, \(y = f(x)\), is a deterministic (and differentiable) mapping from the input pixels \(x \in \mathbb{R}^n\) to a mean output response vector \(y\in \mathbb{ R}^m\), where we assume additive white Gaussian noise in the response space. The Jacobian matrix at x is:

\(J(x) = J = dydx\), \(J\in\mathbb{R}^{m \times n}\) (ie. output_dim x input_dim)

is the matrix of all first-order partial derivatives of the vector-valued function f. The Fisher Information Matrix (FIM) at x, under white Gaussian noise in the response space, is:

\(F = J^T J\)

It is a quadratic approximation of the discriminability of distortions relative to \(x\).

References

[1]

Berardino, A., Laparra, V., Ballé, J. and Simoncelli, E., 2017. Eigen-distortions of hierarchical representations. In Advances in neural information processing systems (pp. 3530-3539). http://www.cns.nyu.edu/pub/lcv/berardino17c-final.pdf http://www.cns.nyu.edu/~lcv/eigendistortions/

Attributes:
eigendistortions

Tensor of eigendistortions (eigenvectors of Fisher matrix), ordered by eigenvalue.

eigenindex

Index of each eigenvector/eigenvalue.

eigenvalues

Tensor of eigenvalues corresponding to each eigendistortion, listed in decreasing order.

image
jacobian

Is only set when synthesize() is run with method='exact'.

model

Methods

compute_jacobian()

Calls autodiff.jacobian and returns jacobian.

load(file_path[, map_location])

Load all relevant stuff from a .pt file.

save(file_path)

Save all relevant variables in .pt file.

synthesize([method, k, max_iter, p, q, ...])

Compute eigendistortions of Fisher Information Matrix with given input image.

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

compute_jacobian()[source]

Calls autodiff.jacobian and returns jacobian. Will throw error if input too big.

Returns:

Jacobian of representation wrt input.

Return type:

J

property eigendistortions

Tensor of eigendistortions (eigenvectors of Fisher matrix), ordered by eigenvalue.

property eigenindex

Index of each eigenvector/eigenvalue.

property eigenvalues

Tensor of eigenvalues corresponding to each eigendistortion, listed in decreasing order.

property image
property jacobian

Is only set when synthesize() is run with method='exact'. Default to None.

load(file_path, map_location=None, **pickle_load_args)[source]

Load all relevant stuff from a .pt file.

This should be called by an initialized Eigendistortion object – we will ensure that image and model are identical.

Note this operates in place and so doesn’t return anything.

Parameters:
  • file_path (str) – The path to load the synthesis object from

  • map_location (str, optional) – map_location argument to pass to torch.load. If you save stuff that was being run on a GPU and are loading onto a CPU, you’ll need this to make sure everything lines up properly. This should be structured like the str you would pass to torch.device

  • pickle_load_args – any additional kwargs will be added to pickle_module.load via torch.load, see that function’s docstring for details.

Examples

>>> geo = po.synth.Geodesic(img_a, img_b, model)
>>> geo.synthesize(max_iter=10, store_progress=True)
>>> geo.save('geo.pt')
>>> geo_copy = po.synth.Geodesic(img_a, img_b, model)
>>> geo_copy.load('geo.pt')

Note that you must create a new instance of the Synthesis object and then load.

property model
save(file_path)[source]

Save all relevant variables in .pt file.

See load docstring for an example of use.

Parameters:

file_path (str) – The path to save the Eigendistortion object to

synthesize(method='power', k=1, max_iter=1000, p=5, q=2, stop_criterion=1e-07)[source]

Compute eigendistortions of Fisher Information Matrix with given input image.

Parameters:
  • method (Literal['exact', 'power', 'randomized_svd']) – Eigensolver method. ‘exact’ tries to do eigendecomposition directly ( not recommended for very large inputs). ‘power’ (default) uses the power method to compute first and last eigendistortions, with maximum number of iterations dictated by n_steps. ‘randomized_svd’ uses randomized SVD to approximate the top k eigendistortions and their corresponding eigenvalues.

  • k (int) – How many vectors to return using block power method or svd.

  • max_iter (int) – Maximum number of steps to run for method='power' in eigenvalue computation. Ignored for other methods.

  • p (int) – Oversampling parameter for randomized SVD. k+p vectors will be sampled, and k will be returned. See docstring of _synthesize_randomized_svd for more details including algorithm reference.

  • q (int) – Matrix power parameter for randomized SVD. This is an effective trick for the algorithm to converge to the correct eigenvectors when the eigenspectrum does not decay quickly. See _synthesize_randomized_svd for more details including algorithm reference.

  • stop_criterion (float) – Used if method='power' to check for convergence. If the L2-norm of the eigenvalues has changed by less than this value from one iteration to the next, we terminate synthesis.

to(*args, **kwargs)[source]

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)[source]
to(dtype, non_blocking=False)[source]
to(tensor, non_blocking=False)[source]

Its signature is similar to torch.Tensor.to(), but only accepts floating point desired dtype s. In addition, this method will only cast the floating point parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Args:
device (torch.device): the desired device of the parameters

and buffers in this module

dtype (torch.dtype): the desired floating point type of

the floating point parameters and buffers in this module

tensor (torch.Tensor): Tensor whose dtype and device are the desired

dtype and device for all parameters and buffers in this module

plenoptic.synthesize.eigendistortion.display_eigendistortion(eigendistortion, eigenindex=0, alpha=5.0, process_image=<function <lambda>>, ax=None, plot_complex='rectangular', **kwargs)[source]

Displays specified eigendistortion added to the image.

If image or eigendistortions have 3 channels, then it is assumed to be a color image and it is converted to grayscale. This is merely for display convenience and may change in the future.

Parameters:
  • eigendistortion (Eigendistortion) – Eigendistortion object whose synthesized eigendistortion we want to display

  • eigenindex (int) – Index of eigendistortion to plot. E.g. If there are 10 eigenvectors, 0 will index the first one, and -1 or 9 will index the last one.

  • alpha (float) – Amount by which to scale eigendistortion for image + (alpha * eigendistortion) for display.

  • process_image (Callable[[Tensor], Tensor]) – A function to process the image+alpha*distortion before clamping between 0,1. E.g. multiplying by the stdev ImageNet then adding the mean of ImageNet to undo image preprocessing.

  • ax (Optional[axis]) – Axis handle on which to plot.

  • plot_complex (str) – Parameter for plenoptic.imshow() determining how to handle complex values. Defaults to ‘rectangular’, which plots real and complex components as separate images. Can also be ‘polar’ or ‘logpolar’; see that method’s docstring for details.

  • kwargs – Additional arguments for po.imshow().

Returns:

matplotlib Figure handle returned by plenoptic.imshow()

Return type:

fig

plenoptic.synthesize.eigendistortion.fisher_info_matrix_eigenvalue(y, x, v, dummy_vec=None)[source]

Compute the eigenvalues of the Fisher Information Matrix corresponding to eigenvectors in v \(\lambda= v^T F v\)

Return type:

Tensor

plenoptic.synthesize.eigendistortion.fisher_info_matrix_vector_product(y, x, v, dummy_vec)[source]

Compute Fisher Information Matrix Vector Product: \(Fv\)

Parameters:
  • y (Tensor) – Output tensor with gradient attached

  • x (Tensor) – Input tensor with gradient attached

  • v (Tensor) – The vectors with which to compute Fisher vector products

  • dummy_vec (Tensor) – Dummy vector for Jacobian vector product trick

Returns:

Vector, Fisher vector product

Return type:

Fv

Notes

Under white Gaussian noise assumption, \(F\) is matrix multiplication of Jacobian transpose and Jacobian: \(F = J^T J\). Hence: \(Fv = J^T (Jv)\)

plenoptic.synthesize.geodesic module
class plenoptic.synthesize.geodesic.Geodesic(image_a, image_b, model, n_steps=10, initial_sequence='straight', range_penalty_lambda=0.1, allowed_range=(0, 1))[source]

Bases: OptimizedSynthesis

Synthesize an approximate geodesic between two images according to a model.

This method can be used to visualize and refine the invariances of a model’s representation as described in [1].

NOTE: This synthesis method is still under construction. It will run, but it might not find the most informative geodesic.

Parameters:
  • image_a (Tensor) – Start and stop anchor points of the geodesic, of shape (1, channel, height, width).

  • image_b (Tensor) – Start and stop anchor points of the geodesic, of shape (1, channel, height, width).

  • model (Module) – an analysis model that computes representations on signals like image_a.

  • n_steps (int) – the number of steps (i.e., transitions) in the trajectory between the two anchor points.

  • initial_sequence (Literal['straight', 'bridge']) – initialize the geodesic with pixel linear interpolation ('straight'), or with a brownian bridge between the two anchors ('bridge').

  • range_penalty_lambda (float) – strength of the regularizer that enforces the allowed_range. Must be non-negative.

  • allowed_range (Tuple[float, float]) – Range (inclusive) of allowed pixel values. Any values outside this range will be penalized.

geodesic

the synthesized sequence of images between the two anchor points that minimizes representation path energy, of shape (n_steps+1, channel, height, width). It starts with image_a and ends with image_b.

Type:

Tensor

pixelfade

the straight interpolation between the two anchor points, used as reference

Type:

Tensor

losses

A list of our loss over iterations.

Type:

Tensor

gradient_norm

A list of the gradient’s L2 norm over iterations.

Type:

list

pixel_change_norm

A list containing the L2 norm of the pixel change over iterations (pixel_change_norm[i] is the pixel change norm in geodesic between iterations i and i-1).

Type:

list

step_energy

step lengths in representation space, stored along the optimization process.

Type:

Tensor

dev_from_line

deviation of the representation to the straight line interpolation, measures distance from straight line and distance along straight line, stored along the optimization process

Type:

Tensor

Notes

Manifold prior hypothesis: natural images form a manifold 𝑀ˣ embedded in signal space (ℝⁿ), a model warps this manifold to another manifold 𝑀ʸ embedded in representation space (ℝᵐ), and thereby induces a different local metric.

This method computes an approximate geodesics by solving an optimization problem: it minimizes the path energy (aka. action functional), which has the same minimum as minimizing path length and by Cauchy-Schwarz, reaches it with constant-speed minimizing geodesic

Caveat: depending on the geometry of the manifold, geodesics between two anchor points may not be unique and may depend on the initialization.

References

[1]

Geodesics of learned representations O J Hénaff and E P Simoncelli Published in Int’l Conf on Learning Representations (ICLR), May 2016. http://www.cns.nyu.edu/~lcv/pubs/makeAbs.php?loc=Henaff16b

Attributes:
allowed_range
dev_from_line

Deviation of representation each from of self.geodesic from a straight line.

geodesic
gradient_norm

Synthesis gradient’s L2 norm over iterations.

image_a
image_b
losses

Synthesis loss over iterations.

model
optimizer
pixel_change_norm

L2 norm change in pixel values over iterations.

range_penalty_lambda
step_energy

Squared L2 norm of transition between geodesic frames in representation space.

store_progress

Methods

calculate_jerkiness([geodesic])

Compute the alignment of representation's acceleration to model local curvature.

load(file_path[, map_location])

Load all relevant stuff from a .pt file.

objective_function([geodesic])

Compute geodesic synthesis loss.

save(file_path)

Save all relevant variables in .pt file.

synthesize([max_iter, optimizer, ...])

Synthesize a geodesic via optimization.

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

calculate_jerkiness(geodesic=None)[source]

Compute the alignment of representation’s acceleration to model local curvature.

This is the first order optimality condition for a geodesic, and can be used to assess the validity of the solution obtained by optimization.

Parameters:

geodesic (Optional[Tensor]) – Geodesic to check. If None, we use self.geodesic. Must have a gradient attached.

Return type:

jerkiness

property dev_from_line

Deviation of representation each from of self.geodesic from a straight line.

Has shape (np.ceil(synth_iter/store_progress), n_steps+1, 2), where synth_iter is the number of iterations of synthesis that have happened. For final dimension, the first element is the Euclidean distance along the straight line and the second is the Euclidean distance to the line.

property geodesic
property image_a
property image_b
load(file_path, map_location=None, **pickle_load_args)[source]

Load all relevant stuff from a .pt file.

This should be called by an initialized Geodesic object – we will ensure that image_a, image_b, model, n_steps, initial_sequence, range_penalty_lambda, allowed_range, and pixelfade are all identical.

Note this operates in place and so doesn’t return anything.

Parameters:
  • file_path (str) – The path to load the synthesis object from

  • map_location (str, optional) – map_location argument to pass to torch.load. If you save stuff that was being run on a GPU and are loading onto a CPU, you’ll need this to make sure everything lines up properly. This should be structured like the str you would pass to torch.device

  • pickle_load_args – any additional kwargs will be added to pickle_module.load via torch.load, see that function’s docstring for details.

Examples

>>> geo = po.synth.Geodesic(img_a, img_b, model)
>>> geo.synthesize(max_iter=10, store_progress=True)
>>> geo.save('geo.pt')
>>> geo_copy = po.synth.Geodesic(img_a, img_b, model)
>>> geo_copy.load('geo.pt')

Note that you must create a new instance of the Synthesis object and then load.

property model
objective_function(geodesic=None)[source]

Compute geodesic synthesis loss.

This is the path energy (i.e., squared L2 norm of each step) of the geodesic’s model representation, with the weighted range penalty.

Additionally, caches:

  • self._geodesic_representation = self.model(geodesic)

  • self._most_recent_step_energy = self._calculate_step_energy(self._geodesic_representation)

These are cached because we might store them (if self.store_progress is True) and don’t want to recalcualte them

Parameters:

geodesic (Optional[Tensor]) – Geodesic to check. If None, we use self.geodesic.

Return type:

loss

save(file_path)[source]

Save all relevant variables in .pt file.

See load docstring for an example of use.

Parameters:

file_path (str) – The path to save the Geodesic object to

property step_energy

Squared L2 norm of transition between geodesic frames in representation space.

Has shape (np.ceil(synth_iter/store_progress), n_steps), where synth_iter is the number of iterations of synthesis that have happened.

synthesize(max_iter=1000, optimizer=None, store_progress=False, stop_criterion=None, stop_iters_to_check=50)[source]

Synthesize a geodesic via optimization.

Parameters:
  • max_iter (int) – The maximum number of iterations to run before we end synthesis (unless we hit the stop criterion).

  • optimizer (Optional[Optimizer]) – The optimizer to use. If None and this is the first time calling synthesize, we use Adam(lr=.001, amsgrad=True); if synthesize has been called before, this must be None and we reuse the previous optimizer.

  • store_progress (Union[bool, int]) – Whether we should store the step energy and deviation of the representation from a straight line. If False, we don’t save anything. If True, we save every iteration. If an int, we save every store_progress iterations (note then that 0 is the same as False and 1 the same as True).

  • stop_criterion (Optional[float]) – If pixel_change_norm (i.e., the norm of the difference in self.geodesic from one iteration to the next) over the past stop_iters_to_check has been less than stop_criterion, we terminate synthesis. If None, we pick a default value based on the norm of self.pixelfade.

  • stop_iters_to_check (int) – How many iterations back to check in order to see if pixel_change_norm has stopped decreasing (for stop_criterion).

to(*args, **kwargs)[source]

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)[source]
to(dtype, non_blocking=False)[source]
to(tensor, non_blocking=False)[source]

Its signature is similar to torch.Tensor.to(), but only accepts floating point desired dtype s. In addition, this method will only cast the floating point parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Args:
device (torch.device): the desired device of the parameters

and buffers in this module

dtype (torch.dtype): the desired floating point type of

the floating point parameters and buffers in this module

tensor (torch.Tensor): Tensor whose dtype and device are the desired

dtype and device for all parameters and buffers in this module

plenoptic.synthesize.geodesic.plot_deviation_from_line(geodesic, natural_video=None, ax=None)[source]

Visual diagnostic of geodesic linearity in representation space.

This plot illustrates the deviation from the straight line connecting the representations of a pair of images, for different paths in representation space.

Parameters:
  • geodesic (Geodesic) – Geodesic object to visualize.

  • natural_video (Optional[Tensor]) – Natural video that bridges the anchor points, for comparison.

  • ax (Optional[Axes]) – If not None, the axis to plot this representation on. If None, we call plt.gca()

Returns:

Axes containing the plot

Return type:

ax

Notes

Axes are in the same units, normalized by the distance separating the end point representations.

Knots along each curve indicate samples used to compute the path.

When the representation is non-linear it may not be feasible for the geodesic to be straight (for example if the representation is normalized, all paths are constrained to live on a hypershpere). Nevertheless, if the representation is able to linearize the transformation between the anchor images, then we expect that both the ground truth natural video sequence and the geodesic will deviate from straight line similarly. By contrast the pixel-based interpolation will deviate significantly more from a straight line.

plenoptic.synthesize.geodesic.plot_loss(geodesic, ax=None, **kwargs)[source]

Plot synthesis loss.

Parameters:
  • geodesic (Geodesic) – Geodesic object whose synthesis loss we want to plot.

  • ax (Optional[Axes]) – If not None, the axis to plot this representation on. If None, we call plt.gca()

  • kwargs – passed to plt.semilogy

Returns:

Axes containing the plot.

Return type:

ax

plenoptic.synthesize.mad_competition module

Run MAD Competition.

class plenoptic.synthesize.mad_competition.MADCompetition(image, optimized_metric, reference_metric, minmax, initial_noise=0.1, metric_tradeoff_lambda=None, range_penalty_lambda=0.1, allowed_range=(0, 1))[source]

Bases: OptimizedSynthesis

Synthesize a single maximally-differentiating image for two metrics.

Following the basic idea in [1], this class synthesizes a maximally-differentiating image for two given metrics, based on a given image. We start by adding noise to this image and then iteratively adjusting its pixels so as to either minimize or maximize optimized_metric while holding the value of reference_metric constant.

MADCompetiton accepts two metrics as its input. These should be callables that take two images and return a single number, and that number should be 0 if and only if the two images are identical (thus, the larger the number, the more different the two images).

Note that a full set of images MAD Competition images consists of two pairs: a maximal and a minimal image for each metric. A single instantiation of MADCompetition will generate one of these four images.

Parameters:
  • image (Tensor) – A 4d tensor, this is the image whose representation we wish to match. If this is not a tensor, we try to cast it as one.

  • optimized_metric (Union[Module, Callable[[Tensor, Tensor], Tensor]]) – The metric whose value you wish to minimize or maximize, which takes two tensors and returns a scalar. Because of the limitations of pickle, you cannot use a lambda function for this if you wish to save the MADCompetition object (i.e., it must be one of our built-in functions or defined using a def statement)

  • reference_metric (Union[Module, Callable[[Tensor, Tensor], Tensor]]) – The metric whose value you wish to keep fixed, which takes two tensors and returns a scalar. Because of the limitations of pickle, you cannot use a lambda function for this if you wish to save the MADCompetition object (i.e., it must be one of our built-in functions or defined using a def statement)

  • minmax (Literal['min', 'max']) – Whether you wish to minimize or maximize optimized_metric.

  • initial_noise (float) – Standard deviation of the Gaussian noise used to initialize mad_image from image.

  • metric_tradeoff_lambda (Optional[float]) – Lambda to multiply by reference_metric loss and add to optimized_metric loss. If None, we pick a value so the two initial losses are approximately equal in magnitude.

  • range_penalty_lambda (float) – Lambda to multiply by range penalty and add to loss.

  • allowable_range – Range (inclusive) of allowed pixel values. Any values outside this range will be penalized.

mad_image

The Maximally-Differentiating Image. This may be unfinished depending on how many iterations we’ve run for.

Type:

torch.Tensor

initial_image

The initial mad_image, which we obtain by adding Gaussian noise to image.

Type:

torch.Tensor

losses

A list of the objective function’s loss over iterations.

Type:

list

gradient_norm

A list of the gradient’s L2 norm over iterations.

Type:

list

pixel_change_norm

A list containing the L2 norm of the pixel change over iterations (pixel_change_norm[i] is the pixel change norm in mad_image between iterations i and i-1).

Type:

list

optimized_metric_loss

A list of the optimized_metric loss over iterations.

Type:

list

reference_metric_loss

A list of the reference_metric loss over iterations.

Type:

list

saved_mad_image

Saved self.mad_image for later examination.

Type:

torch.Tensor

References

[1]

Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual discriminability. Journal of Vision, 8(12), 1–13. http://dx.doi.org/10.1167/8.12.8

Attributes:
allowed_range
gradient_norm

Synthesis gradient’s L2 norm over iterations.

image
initial_image
losses

Synthesis loss over iterations.

mad_image
metric_tradeoff_lambda
minmax
optimized_metric
optimized_metric_loss
optimizer
pixel_change_norm

L2 norm change in pixel values over iterations.

range_penalty_lambda
reference_metric
reference_metric_loss
saved_mad_image
store_progress

Methods

load(file_path[, map_location])

Load all relevant stuff from a .pt file.

objective_function([mad_image, image])

Compute the MADCompetition synthesis loss.

save(file_path)

Save all relevant variables in .pt file.

synthesize([max_iter, optimizer, scheduler, ...])

Synthesize a MAD image.

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

property image
property initial_image
load(file_path, map_location=None, **pickle_load_args)[source]

Load all relevant stuff from a .pt file.

This should be called by an initialized MADCompetition object – we will ensure that image, metric_tradeoff_lambda, range_penalty_lambda, allowed_range, minmax are all identical, and that reference_metric and optimize_metric return identical values.

Note this operates in place and so doesn’t return anything.

Parameters:
  • file_path (str) – The path to load the synthesis object from

  • map_location (str, optional) – map_location argument to pass to torch.load. If you save stuff that was being run on a GPU and are loading onto a CPU, you’ll need this to make sure everything lines up properly. This should be structured like the str you would pass to torch.device

  • pickle_load_args – any additional kwargs will be added to pickle_module.load via torch.load, see that function’s docstring for details.

Examples

>>> mad = po.synth.MADCompetition(img, model)
>>> mad.synthesize(max_iter=10, store_progress=True)
>>> mad.save('mad.pt')
>>> mad_copy = po.synth.MADCompetition(img, model)
>>> mad_copy.load('mad.pt')

Note that you must create a new instance of the Synthesis object and then load.

property mad_image
property metric_tradeoff_lambda
property minmax
objective_function(mad_image=None, image=None)[source]

Compute the MADCompetition synthesis loss.

This computes:

\[\begin{split}t L_1(x, \hat{x}) &+ \lambda_1 [L_2(x, x+\epsilon) - L_2(x, \hat{x})]^2 \\ &+ \lambda_2 \mathcal{B}(\hat{x})\end{split}\]

where \(t\) is 1 if self.minmax is 'min' and -1 if it’s 'max', \(L_1\) is self.optimized_metric, \(L_2\) is self.reference_metric, \(x\) is self.image, \(\hat{x}\) is self.mad_image, \(\epsilon\) is the initial noise, \(\mathcal{B}\) is the quadratic bound penalty, \(\lambda_1\) is self.metric_tradeoff_lambda and \(\lambda_2\) is self.range_penalty_lambda.

Parameters:
  • mad_image (Optional[Tensor]) – Proposed mad_image, \(\hat{x}\) in the above equation. If None, use self.mad_image.

  • image (Optional[Tensor]) – Proposed image, \(x\) in the above equation. If None, use self.image.

Return type:

loss

property optimized_metric
property optimized_metric_loss
property reference_metric
property reference_metric_loss
save(file_path)[source]

Save all relevant variables in .pt file.

Note that if store_progress is True, this will probably be very large.

See load docstring for an example of use.

Parameters:

file_path (str) – The path to save the MADCompetition object to

property saved_mad_image
synthesize(max_iter=100, optimizer=None, scheduler=None, store_progress=False, stop_criterion=0.0001, stop_iters_to_check=50)[source]

Synthesize a MAD image.

Update the pixels of initial_image to maximize or minimize (depending on the value of minmax) the value of optimized_metric(image, mad_image) while keeping the value of reference_metric(image, mad_image) constant.

We run this until either we reach max_iter or the change over the past stop_iters_to_check iterations is less than stop_criterion, whichever comes first

Parameters:
  • max_iter (int) – The maximum number of iterations to run before we end synthesis (unless we hit the stop criterion).

  • optimizer (Optional[Optimizer]) – The optimizer to use. If None and this is the first time calling synthesize, we use Adam(lr=.01, amsgrad=True); if synthesize has been called before, this must be None and we reuse the previous optimizer.

  • scheduler (Optional[_LRScheduler]) – The learning rate scheduler to use. If None, we don’t use one.

  • store_progress (Union[bool, int]) – Whether we should store the representation of the MAD image in progress on every iteration. If False, we don’t save anything. If True, we save every iteration. If an int, we save every store_progress iterations (note then that 0 is the same as False and 1 the same as True).

  • stop_criterion (float) – If the loss over the past stop_iters_to_check has changed less than stop_criterion, we terminate synthesis.

  • stop_iters_to_check (int) – How many iterations back to check in order to see if the loss has stopped decreasing (for stop_criterion).

to(*args, **kwargs)[source]

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)[source]
to(dtype, non_blocking=False)[source]
to(tensor, non_blocking=False)[source]

Its signature is similar to torch.Tensor.to(), but only accepts floating point desired dtype s. In addition, this method will only cast the floating point parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Args:
device (torch.device): the desired device of the parameters

and buffers in this module

dtype (torch.dtype): the desired floating point type of

the floating point parameters and buffers in this module

tensor (torch.Tensor): Tensor whose dtype and device are the desired

dtype and device for all parameters and buffers in this module

plenoptic.synthesize.mad_competition.animate(mad, framerate=10, batch_idx=0, channel_idx=None, zoom=None, fig=None, axes_idx={}, figsize=None, included_plots=['display_mad_image', 'plot_loss', 'plot_pixel_values'], width_ratios={})[source]

Animate synthesis progress.

This is essentially the figure produced by mad.plot_synthesis_status animated over time, for each stored iteration.

We return the matplotlib FuncAnimation object. In order to view it in a Jupyter notebook, use the plenoptic.tools.display.convert_anim_to_html(anim) function. In order to save, use anim.save(filename) (note for this that you’ll need the appropriate writer installed and on your path, e.g., ffmpeg, imagemagick, etc). Either of these will probably take a reasonably long amount of time.

Parameters:
  • mad (MADCompetition) – MADCompetition object whose synthesis we want to animate.

  • framerate (int) – How many frames a second to display.

  • batch_idx (int) – Which index to take from the batch dimension

  • channel_idx (Optional[int]) – Which index to take from the channel dimension. If None, we use all channels (assumed use-case is RGB(A) image).

  • zoom (Optional[float]) – How much to zoom in / enlarge the synthesized image, the ratio of display pixels to image pixels. If None (the default), we attempt to find the best value ourselves.

  • fig (Optional[Figure]) – If None, create the figure from scratch. Else, should be an empty figure with enough axes (the expected use here is have same-size movies with different plots).

  • axes_idx (Dict[str, int]) – Dictionary specifying which axes contains which type of plot, allows for more fine-grained control of the resulting figure. Probably only helpful if fig is also defined. Possible keys: 'mad_image', 'loss', 'pixel_values', 'misc'. Values should all be ints. If you tell this function to create a plot that doesn’t have a corresponding key, we find the lowest int that is not already in the dict, so if you have axes that you want unchanged, place their idx in 'misc'.

  • figsize (Optional[Tuple[float]]) – The size of the figure to create. It may take a little bit of playing around to find a reasonable value. If None, we attempt to make our best guess, aiming to have each axis be of size (5, 5)

  • width_ratios (Dict[str, float]) – By default, all plots axes will have the same width. To change that, specify their relative widths using the keys: [‘display_mad_image’, ‘plot_loss’, ‘plot_pixel_values’] and floats specifying their relative width. Any not included will be assumed to be 1.

Returns:

The animation object. In order to view, must convert to HTML or save.

Return type:

anim

Notes

By default, we use the ffmpeg backend, which requires that you have ffmpeg installed and on your path (https://ffmpeg.org/download.html). To use a different, use the matplotlib rcParams: matplotlib.rcParams[‘animation.writer’] = writer, see https://matplotlib.org/stable/api/animation_api.html#writer-classes for more details.

For displaying in a jupyter notebook, ffmpeg appears to be required.

plenoptic.synthesize.mad_competition.display_mad_image(mad, batch_idx=0, channel_idx=None, zoom=None, iteration=None, ax=None, title='MADCompetition', **kwargs)[source]

Display MAD image.

You can specify what iteration to view by using the iteration arg. The default, None, shows the final one.

We use plenoptic.imshow to display the synthesized image and attempt to automatically find the most reasonable zoom value. You can override this value using the zoom arg, but remember that plenoptic.imshow is opinionated about the size of the resulting image and will throw an Exception if the axis created is not big enough for the selected zoom.

Parameters:
  • mad (MADCompetition) – MADCompetition object whose MAD image we want to display.

  • batch_idx (int) – Which index to take from the batch dimension

  • channel_idx (Optional[int]) – Which index to take from the channel dimension. If None, we assume image is RGB(A) and show all channels.

  • zoom (Optional[float]) – How much to zoom in / enlarge the synthesized image, the ratio of display pixels to image pixels. If None (the default), we attempt to find the best value ourselves.

  • iteration (Optional[int]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.

  • ax (Optional[Axes]) – Pre-existing axes for plot. If None, we call plt.gca().

  • title (str) – Title of the axis.

  • kwargs – Passed to plenoptic.imshow

Returns:

The matplotlib axes containing the plot.

Return type:

ax

plenoptic.synthesize.mad_competition.display_mad_image_all(mad_metric1_min, mad_metric2_min, mad_metric1_max, mad_metric2_max, metric1_name=None, metric2_name=None, zoom=1, **kwargs)[source]

Display all MAD Competition images.

To generate a full set of MAD Competition images, you need four instances: one for minimizing and maximizing each metric. This helper function creates a figure to display the full set of images.

In addition to the four MAD Competition images, this also plots the initial image from mad_metric1_min, for comparison.

Note that all four MADCompetition instances must have the same image.

Parameters:
  • mad_metric1_min (MADCompetition) – MADCompetition object that minimized the first metric.

  • mad_metric2_min (MADCompetition) – MADCompetition object that minimized the second metric.

  • mad_metric1_max (MADCompetition) – MADCompetition object that maximized the first metric.

  • mad_metric2_max (MADCompetition) – MADCompetition object that maximized the second metric.

  • metric1_name (Optional[str]) – Name of the first metric. If None, we use the name of the optimized_metric function from mad_metric1_min.

  • metric2_name (Optional[str]) – Name of the second metric. If None, we use the name of the optimized_metric function from mad_metric2_min.

  • zoom (Union[int, float]) – Ratio of display pixels to image pixels. See plenoptic.imshow for details.

  • kwargs – Passed to plenoptic.imshow.

Returns:

Figure containing the images.

Return type:

fig

plenoptic.synthesize.mad_competition.plot_loss(mad, iteration=None, axes=None, **kwargs)[source]

Plot metric losses.

Plots mad.optimized_metric_loss and mad.reference_metric_loss on two separate axes, over all iterations. Also plots a red dot at iteration, to highlight the loss there. If iteration=None, then the dot will be at the final iteration.

Parameters:
  • mad (MADCompetition) – MADCompetition object whose loss we want to plot.

  • iteration (Optional[int]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.

  • axes (Union[List[Axes], Axes, None]) – Pre-existing axes for plot. If a list of axes, must be the two axes to use for this plot. If a single axis, we’ll split it in half horizontally. If None, we call plt.gca().

  • kwargs – passed to plt.plot

Returns:

The matplotlib axes containing the plot.

Return type:

axes

Notes

We plot abs(mad.losses) because if we’re maximizing the synthesis metric, we minimized its negative. By plotting the absolute value, we get them all on the same scale.

plenoptic.synthesize.mad_competition.plot_loss_all(mad_metric1_min, mad_metric2_min, mad_metric1_max, mad_metric2_max, metric1_name=None, metric2_name=None, metric1_kwargs={'c': 'C0'}, metric2_kwargs={'c': 'C1'}, min_kwargs={'linestyle': '--'}, max_kwargs={'linestyle': '-'}, figsize=(10, 5))[source]

Plot loss for full set of MAD Competiton instances.

To generate a full set of MAD Competition images, you need four instances: one for minimizing and maximizing each metric. This helper function creates a two-axis figure to display the loss for this full set.

Note that all four MADCompetition instances must have the same image.

Parameters:
  • mad_metric1_min (MADCompetition) – MADCompetition object that minimized the first metric.

  • mad_metric2_min (MADCompetition) – MADCompetition object that minimized the second metric.

  • mad_metric1_max (MADCompetition) – MADCompetition object that maximized the first metric.

  • mad_metric2_max (MADCompetition) – MADCompetition object that maximized the second metric.

  • metric1_name (Optional[str]) – Name of the first metric. If None, we use the name of the optimized_metric function from mad_metric1_min.

  • metric2_name (Optional[str]) – Name of the second metric. If None, we use the name of the optimized_metric function from mad_metric2_min.

  • metric1_kwargs (Dict) – Dictionary of arguments to pass to matplotlib.pyplot.plot to identify synthesis instance where the first metric was being optimized.

  • metric2_kwargs (Dict) – Dictionary of arguments to pass to matplotlib.pyplot.plot to identify synthesis instance where the second metric was being optimized.

  • min_kwargs (Dict) – Dictionary of arguments to pass to matplotlib.pyplot.plot to identify synthesis instance where optimized_metric was being minimized.

  • max_kwargs (Dict) – Dictionary of arguments to pass to matplotlib.pyplot.plot to identify synthesis instance where optimized_metric was being maximized.

  • figsize – Size of the figure we create.

Returns:

Figure containing the plot.

Return type:

fig

plenoptic.synthesize.mad_competition.plot_pixel_values(mad, batch_idx=0, channel_idx=None, iteration=None, ylim=False, ax=None, **kwargs)[source]

Plot histogram of pixel values of reference and MAD images.

As a way to check the distributions of pixel intensities and see if there’s any values outside the allowed range

Parameters:
  • mad (MADCompetition) – MADCompetition object with the images whose pixel values we want to compare.

  • batch_idx (int) – Which index to take from the batch dimension

  • channel_idx (Optional[int]) – Which index to take from the channel dimension. If None, we use all channels (assumed use-case is RGB(A) images).

  • iteration (Optional[int]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.

  • ylim (Union[Tuple[float], Literal[False]]) – if tuple, the ylimit to set for this axis. If False, we leave it untouched

  • ax (Optional[Axes]) – Pre-existing axes for plot. If None, we call plt.gca().

  • kwargs – passed to plt.hist

Returns:

Creates axes.

Return type:

ax

plenoptic.synthesize.mad_competition.plot_synthesis_status(mad, batch_idx=0, channel_idx=None, iteration=None, vrange='indep1', zoom=None, fig=None, axes_idx={}, figsize=None, included_plots=['display_mad_image', 'plot_loss', 'plot_pixel_values'], width_ratios={})[source]

Make a plot showing synthesis status.

We create several subplots to analyze this. By default, we create two subplots on a new figure: the first one contains the MAD image and the second contains the loss.

There is an optional additional plot: pixel_values, a histogram of pixel values of the synthesized and target images.

All of these (including the default plots) can be toggled using their corresponding boolean flags, and can be created separately using the method with the name plot_{flag}.

Parameters:
  • mad (MADCompetition) – MADCompetition object whose status we want to plot.

  • batch_idx (int) – Which index to take from the batch dimension

  • channel_idx (Optional[int]) – Which index to take from the channel dimension. If None, we use all channels (assumed use-case is RGB(A) image).

  • iteration (Optional[int]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.

  • vrange (Union[Tuple[float], str]) – The vrange option to pass to display_mad_image(). See docstring of imshow for possible values.

  • zoom (Optional[float]) – How much to zoom in / enlarge the synthesized image, the ratio of display pixels to image pixels. If None (the default), we attempt to find the best value ourselves.

  • fig (Optional[Figure]) – if None, we create a new figure. otherwise we assume this is an empty figure that has the appropriate size and number of subplots

  • axes_idx (Dict[str, int]) – Dictionary specifying which axes contains which type of plot, allows for more fine-grained control of the resulting figure. Probably only helpful if fig is also defined. Possible keys: 'mad_image', 'loss', 'pixel_values', 'misc'. Values should all be ints. If you tell this function to create a plot that doesn’t have a corresponding key, we find the lowest int that is not already in the dict, so if you have axes that you want unchanged, place their idx in 'misc'.

  • figsize (Optional[Tuple[float]]) – The size of the figure to create. It may take a little bit of playing around to find a reasonable value. If None, we attempt to make our best guess, aiming to have each axis be of size (5, 5)

  • included_plots (List[str]) – Which plots to include. Must be some subset of 'display_mad_image', 'plot_loss', 'plot_pixel_values'.

  • width_ratios (Dict[str, float]) – By default, all plots axes will have the same width. To change that, specify their relative widths using the keys: [‘display_mad_image’, ‘plot_loss’, ‘plot_pixel_values’] and floats specifying their relative width. Any not included will be assumed to be 1.

Return type:

Tuple[Figure, Dict[str, int]]

Returns:

  • fig – The figure containing this plot

  • axes_idx – Dictionary giving index of each plot.

plenoptic.synthesize.metamer module

Synthesize model metamers.

class plenoptic.synthesize.metamer.Metamer(image, model, loss_function=<function mse>, range_penalty_lambda=0.1, allowed_range=(0, 1), initial_image=None)[source]

Bases: OptimizedSynthesis

Synthesize metamers for image-computable differentiable models.

Following the basic idea in [1], this class creates a metamer for a given model on a given image. We start with initial_image and iteratively adjust the pixel values so as to match the representation of the metamer and image.

All saved_ attributes are initialized as empty lists and will be non-empty if the store_progress arg to synthesize() is not False. They will be appended to on every iteration if store_progress=True or every store_progress iterations if it’s an int.

Parameters:
  • image (Tensor) – A 4d tensor, this is the image whose representation we wish to match. If this is not a tensor, we try to cast it as one.

  • model (Module) – A visual model, see Metamer notebook for more details

  • loss_function (Callable[[Tensor, Tensor], Tensor]) – the loss function to use to compare the representations of the models in order to determine their loss. Because of the limitations of pickle, you cannot use a lambda function for this if you wish to save the Metamer object (i.e., it must be one of our built-in functions or defined using a def statement)

  • range_penalty_lambda (float) – strength of the regularizer that enforces the allowed_range. Must be non-negative.

  • allowed_range (Tuple[float, float]) – Range (inclusive) of allowed pixel values. Any values outside this range will be penalized.

  • initial_image (Optional[Tensor]) – 4d Tensor to initialize our metamer with. If None, will draw a sample of uniform noise within allowed_range.

target_representation

Whatever is returned by model(image), this is what we match in order to create a metamer

Type:

torch.Tensor

metamer

The metamer. This may be unfinished depending on how many iterations we’ve run for.

Type:

torch.Tensor

losses

A list of our loss over iterations.

Type:

list

gradient_norm

A list of the gradient’s L2 norm over iterations.

Type:

list

pixel_change_norm

A list containing the L2 norm of the pixel change over iterations (pixel_change_norm[i] is the pixel change norm in metamer between iterations i and i-1).

Type:

list

saved_metamer

Saved self.metamer for later examination.

Type:

torch.Tensor

References

[1]

J Portilla and E P Simoncelli. A Parametric Texture Model based on Joint Statistics of Complex Wavelet Coefficients. Int’l Journal of Computer Vision. 40(1):49-71, October, 2000. http://www.cns.nyu.edu/~eero/ABSTRACTS/portilla99-abstract.html http://www.cns.nyu.edu/~lcv/texture/

Attributes:
allowed_range
gradient_norm

Synthesis gradient’s L2 norm over iterations.

image
losses

Synthesis loss over iterations.

metamer
model
optimizer
pixel_change_norm

L2 norm change in pixel values over iterations.

range_penalty_lambda
saved_metamer
store_progress
target_representation

Model representation of image, the goal of synthesis is for model(metamer) to match this value.

Methods

load(file_path[, map_location])

Load all relevant stuff from a .pt file.

objective_function([metamer_representation, ...])

Compute the metamer synthesis loss.

save(file_path)

Save all relevant variables in .pt file.

synthesize([max_iter, optimizer, scheduler, ...])

Synthesize a metamer.

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

property image
load(file_path, map_location=None, **pickle_load_args)[source]

Load all relevant stuff from a .pt file.

This should be called by an initialized Metamer object – we will ensure that image, target_representation (and thus model), and loss_function are all identical.

Note this operates in place and so doesn’t return anything.

Parameters:
  • file_path (str) – The path to load the synthesis object from

  • map_location (str, optional) – map_location argument to pass to torch.load. If you save stuff that was being run on a GPU and are loading onto a CPU, you’ll need this to make sure everything lines up properly. This should be structured like the str you would pass to torch.device

  • pickle_load_args – any additional kwargs will be added to pickle_module.load via torch.load, see that function’s docstring for details.

Examples

>>> metamer = po.synth.Metamer(img, model)
>>> metamer.synthesize(max_iter=10, store_progress=True)
>>> metamer.save('metamers.pt')
>>> metamer_copy = po.synth.Metamer(img, model)
>>> metamer_copy.load('metamers.pt')

Note that you must create a new instance of the Synthesis object and then load.

property metamer
property model
objective_function(metamer_representation=None, target_representation=None)[source]

Compute the metamer synthesis loss.

This calls self.loss_function on metamer_representation and target_representation and then adds the weighted range penalty.

Parameters:
  • metamer_representation (Optional[Tensor]) – Model response to metamer. If None, we use self.model(self.metamer)

  • target_representation (Optional[Tensor]) – Model response to image. If None, we use self.target_representation.

Return type:

loss

save(file_path)[source]

Save all relevant variables in .pt file.

Note that if store_progress is True, this will probably be very large.

See load docstring for an example of use.

Parameters:

file_path (str) – The path to save the metamer object to

property saved_metamer
synthesize(max_iter=100, optimizer=None, scheduler=None, store_progress=False, stop_criterion=0.0001, stop_iters_to_check=50)[source]

Synthesize a metamer.

Update the pixels of initial_image until its representation matches that of image.

We run this until either we reach max_iter or the change over the past stop_iters_to_check iterations is less than stop_criterion, whichever comes first

Parameters:
  • max_iter (int) – The maximum number of iterations to run before we end synthesis (unless we hit the stop criterion).

  • optimizer (Optional[Optimizer]) – The optimizer to use. If None and this is the first time calling synthesize, we use Adam(lr=.01, amsgrad=True); if synthesize has been called before, this must be None and we reuse the previous optimizer.

  • scheduler (Optional[_LRScheduler]) – The learning rate scheduler to use. If None, we don’t use one.

  • store_progress (Union[bool, int]) – Whether we should store the metamer image in progress on every iteration. If False, we don’t save anything. If True, we save every iteration. If an int, we save every store_progress iterations (note then that 0 is the same as False and 1 the same as True).

  • stop_criterion (float) – If the loss over the past stop_iters_to_check has changed less than stop_criterion, we terminate synthesis.

  • stop_iters_to_check (int) – How many iterations back to check in order to see if the loss has stopped decreasing (for stop_criterion).

property target_representation

Model representation of image, the goal of synthesis is for model(metamer) to match this value.

to(*args, **kwargs)[source]

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)[source]
to(dtype, non_blocking=False)[source]
to(tensor, non_blocking=False)[source]

Its signature is similar to torch.Tensor.to(), but only accepts floating point desired dtype s. In addition, this method will only cast the floating point parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Args:
device (torch.device): the desired device of the parameters

and buffers in this module

dtype (torch.dtype): the desired floating point type of

the floating point parameters and buffers in this module

tensor (torch.Tensor): Tensor whose dtype and device are the desired

dtype and device for all parameters and buffers in this module

class plenoptic.synthesize.metamer.MetamerCTF(image, model, loss_function=<function mse>, range_penalty_lambda=0.1, allowed_range=(0, 1), initial_image=None, coarse_to_fine='together')[source]

Bases: Metamer

Synthesize model metamers with coarse-to-fine synthesis.

This is a special case of Metamer, which uses the coarse-to-fine synthesis procedure described in [1]_: we start by updating metamer with respect to only a subset of the model’s representation (generally, that which corresponds to the lowest spatial frequencies), and changing which subset we consider over the course of synthesis. This is similar to optimizing with a blurred version of the objective function and gradually adding in finer details. It improves synthesis performance for some models.

Parameters:
  • image (Tensor) – A 4d tensor, this is the image whose representation we wish to match. If this is not a tensor, we try to cast it as one.

  • model (Module) – A visual model, see Metamer notebook for more details

  • loss_function (Callable[[Tensor, Tensor], Tensor]) – the loss function to use to compare the representations of the models in order to determine their loss. Because of the limitations of pickle, you cannot use a lambda function for this if you wish to save the Metamer object (i.e., it must be one of our built-in functions or defined using a def statement)

  • range_penalty_lambda (float) – strength of the regularizer that enforces the allowed_range. Must be non-negative.

  • allowed_range (Tuple[float, float]) – Range (inclusive) of allowed pixel values. Any values outside this range will be penalized.

  • initial_image (Optional[Tensor]) – 4d Tensor to initialize our metamer with. If None, will draw a sample of uniform noise within allowed_range.

  • coarse_to_fine (Literal['together', 'separate']) –

    • ‘together’: start with the coarsest scale, then gradually add each finer scale.

    • ’separate’: compute the gradient with respect to each scale separately (ignoring the others), then with respect to all of them at the end.

    (see Metamer tutorial for more details).

target_representation

Whatever is returned by model(image), this is what we match in order to create a metamer

Type:

torch.Tensor

metamer

The metamer. This may be unfinished depending on how many iterations we’ve run for.

Type:

torch.Tensor

losses

A list of our loss over iterations.

Type:

list

gradient_norm

A list of the gradient’s L2 norm over iterations.

Type:

list

pixel_change_norm

A list containing the L2 norm of the pixel change over iterations (pixel_change_norm[i] is the pixel change norm in metamer between iterations i and i-1).

Type:

list

saved_metamer

Saved self.metamer for later examination.

Type:

torch.Tensor

scales

The list of scales in optimization order (i.e., from coarse to fine). Will be modified during the course of optimization.

Type:

list or None

scales_loss

The scale-specific loss at each iteration

Type:

list or None

scales_timing

Keys are the values found in scales, values are lists, specifying the iteration where we started and stopped optimizing this scale.

Type:

dict or None

scales_finished

List of scales that we’ve finished optimizing.

Type:

list or None

Attributes:
allowed_range
coarse_to_fine
gradient_norm

Synthesis gradient’s L2 norm over iterations.

image
losses

Synthesis loss over iterations.

metamer
model
optimizer
pixel_change_norm

L2 norm change in pixel values over iterations.

range_penalty_lambda
saved_metamer
scales
scales_finished
scales_loss
scales_timing
store_progress
target_representation

Model representation of image, the goal of synthesis is for model(metamer) to match this value.

Methods

load(file_path[, map_location])

Load all relevant stuff from a .pt file.

objective_function([metamer_representation, ...])

Compute the metamer synthesis loss.

save(file_path)

Save all relevant variables in .pt file.

synthesize([max_iter, optimizer, scheduler, ...])

Synthesize a metamer.

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

property coarse_to_fine
load(file_path, map_location=None, **pickle_load_args)[source]

Load all relevant stuff from a .pt file.

This should be called by an initialized Metamer object – we will ensure that image, target_representation (and thus model), and loss_function are all identical.

Note this operates in place and so doesn’t return anything.

Parameters:
  • file_path (str) – The path to load the synthesis object from

  • map_location (str, optional) – map_location argument to pass to torch.load. If you save stuff that was being run on a GPU and are loading onto a CPU, you’ll need this to make sure everything lines up properly. This should be structured like the str you would pass to torch.device

  • pickle_load_args – any additional kwargs will be added to pickle_module.load via torch.load, see that function’s docstring for details.

Examples

>>> metamer = po.synth.Metamer(img, model)
>>> metamer.synthesize(max_iter=10, store_progress=True)
>>> metamer.save('metamers.pt')
>>> metamer_copy = po.synth.Metamer(img, model)
>>> metamer_copy.load('metamers.pt')

Note that you must create a new instance of the Synthesis object and then load.

property scales
property scales_finished
property scales_loss
property scales_timing
synthesize(max_iter=100, optimizer=None, scheduler=None, store_progress=False, stop_criterion=0.0001, stop_iters_to_check=50, change_scale_criterion=0.01, ctf_iters_to_check=50)[source]

Synthesize a metamer.

Update the pixels of initial_image until its representation matches that of image.

We run this until either we reach max_iter or the change over the past stop_iters_to_check iterations is less than stop_criterion, whichever comes first

Parameters:
  • max_iter (int) – The maximum number of iterations to run before we end synthesis (unless we hit the stop criterion).

  • optimizer (Optional[Optimizer]) – The optimizer to use. If None and this is the first time calling synthesize, we use Adam(lr=.01, amsgrad=True); if synthesize has been called before, this must be None and we reuse the previous optimizer.

  • scheduler (Optional[_LRScheduler]) – The learning rate scheduler to use. If None, we don’t use one.

  • store_progress (Union[bool, int]) – Whether we should store the metamer image in progress on every iteration. If False, we don’t save anything. If True, we save every iteration. If an int, we save every store_progress iterations (note then that 0 is the same as False and 1 the same as True).

  • stop_criterion (float) – If the loss over the past stop_iters_to_check has changed less than stop_criterion, we terminate synthesis.

  • stop_iters_to_check (int) – How many iterations back to check in order to see if the loss has stopped decreasing (for stop_criterion).

  • change_scale_criterion (Optional[float]) – Scale-specific analogue of change_scale_criterion: we consider a given scale finished (and move onto the next) if the loss has changed less than this in the past ctf_iters_to_check iterations. If None, we’ll change scales as soon as we’ve spent ctf_iters_to_check on a given scale

  • ctf_iters_to_check (int) – Scale-specific analogue of stop_iters_to_check: how many iterations back in order to check in order to see if we should switch scales.

plenoptic.synthesize.metamer.animate(metamer, framerate=10, batch_idx=0, channel_idx=None, ylim=None, vrange=(0, 1), zoom=None, plot_representation_error_as_rgb=False, fig=None, axes_idx={}, figsize=None, included_plots=['display_metamer', 'plot_loss', 'plot_representation_error'], width_ratios={})[source]

Animate synthesis progress.

This is essentially the figure produced by metamer.plot_synthesis_status animated over time, for each stored iteration.

We return the matplotlib FuncAnimation object. In order to view it in a Jupyter notebook, use the plenoptic.tools.display.convert_anim_to_html(anim) function. In order to save, use anim.save(filename) (note for this that you’ll need the appropriate writer installed and on your path, e.g., ffmpeg, imagemagick, etc). Either of these will probably take a reasonably long amount of time.

Parameters:
  • metamer (Metamer) – Metamer object whose synthesis we want to animate.

  • framerate (int) – How many frames a second to display.

  • batch_idx (int) – Which index to take from the batch dimension

  • channel_idx (Optional[int]) – Which index to take from the channel dimension. If None, we use all channels (assumed use-case is RGB(A) image).

  • ylim (Union[str, None, Tuple[float, float], Literal[False]]) –

    The y-limits of the representation_error plot:

    • If a tuple, then this is the ylim of all plots

    • If None, then all plots have the same limits, all symmetric about 0 with a limit of np.abs(representation_error).max() (for the initial representation_error)

    • If False, don’t modify limits.

    • If a string, must be ‘rescale’ or of the form ‘rescaleN’, where N can be any integer. If ‘rescaleN’, we rescale the limits every N frames (we rescale as if ylim = None). If ‘rescale’, then we do this 10 times over the course of the animation

  • vrange (Union[Tuple[float, float], str]) – The vrange option to pass to display_metamer(). See docstring of imshow for possible values.

  • zoom (Optional[float]) – How much to zoom in / enlarge the metamer, the ratio of display pixels to image pixels. If None (the default), we attempt to find the best value ourselves.

  • plot_representation_error_as_rgb (bool) – The representation can be image-like with multiple channels, and we have no way to determine whether it should be represented as an RGB image or not, so the user must set this flag to tell us. It will be ignored if the representation doesn’t look image-like or if the model has its own plot_representation_error() method. Else, it will be passed to po.imshow(), see that methods docstring for details. since plot_synthesis_status normally sets it up for us

  • fig (Optional[Figure]) – If None, create the figure from scratch. Else, should be an empty figure with enough axes (the expected use here is have same-size movies with different plots).

  • axes_idx (Dict[str, int]) – Dictionary specifying which axes contains which type of plot, allows for more fine-grained control of the resulting figure. Probably only helpful if fig is also defined. Possible keys: 'display_metamer', 'plot_loss', 'plot_representation_error', 'plot_pixel_values', 'misc'. Values should all be ints. If you tell this function to create a plot that doesn’t have a corresponding key, we find the lowest int that is not already in the dict, so if you have axes that you want unchanged, place their idx in 'misc'.

  • figsize (Optional[Tuple[float, float]]) – The size of the figure to create. It may take a little bit of playing around to find a reasonable value. If None, we attempt to make our best guess, aiming to have each axis be of size (5, 5)

  • included_plots (List[str]) – Which plots to include. Must be some subset of 'display_metamer', 'plot_loss', 'plot_representation_error', 'plot_pixel_values'.

  • width_ratios (Dict[str, float]) – By default, all plots axes will have the same width. To change that, specify their relative widths using the keys: 'display_metamer', 'plot_loss', 'plot_representation_error', 'plot_pixel_values' and floats specifying their relative width. Any not included will be assumed to be 1.

Returns:

The animation object. In order to view, must convert to HTML or save.

Return type:

anim

Notes

By default, we use the ffmpeg backend, which requires that you have ffmpeg installed and on your path (https://ffmpeg.org/download.html). To use a different, use the matplotlib rcParams: matplotlib.rcParams[‘animation.writer’] = writer, see https://matplotlib.org/stable/api/animation_api.html#writer-classes for more details.

For displaying in a jupyter notebook, ffmpeg appears to be required.

plenoptic.synthesize.metamer.display_metamer(metamer, batch_idx=0, channel_idx=None, zoom=None, iteration=None, ax=None, **kwargs)[source]

Display metamer.

You can specify what iteration to view by using the iteration arg. The default, None, shows the final one.

We use plenoptic.imshow to display the metamer and attempt to automatically find the most reasonable zoom value. You can override this value using the zoom arg, but remember that plenoptic.imshow is opinionated about the size of the resulting image and will throw an Exception if the axis created is not big enough for the selected zoom.

Parameters:
  • metamer (Metamer) – Metamer object whose synthesized metamer we want to display.

  • batch_idx (int) – Which index to take from the batch dimension

  • channel_idx (Optional[int]) – Which index to take from the channel dimension. If None, we assume image is RGB(A) and show all channels.

  • iteration (Optional[int]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.

  • ax (Optional[Axes]) – Pre-existing axes for plot. If None, we call plt.gca().

  • zoom (Optional[float]) – How much to zoom in / enlarge the metamer, the ratio of display pixels to image pixels. If None (the default), we attempt to find the best value ourselves.

  • kwargs – Passed to plenoptic.imshow

Returns:

The matplotlib axes containing the plot.

Return type:

ax

plenoptic.synthesize.metamer.plot_loss(metamer, iteration=None, ax=None, **kwargs)[source]

Plot synthesis loss with log-scaled y axis.

Plots metamer.losses over all iterations. Also plots a red dot at iteration, to highlight the loss there. If iteration=None, then the dot will be at the final iteration.

Parameters:
  • metamer (Metamer) – Metamer object whose loss we want to plot.

  • iteration (Optional[int]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.

  • ax (Optional[Axes]) – Pre-existing axes for plot. If None, we call plt.gca().

  • kwargs – passed to plt.semilogy

Returns:

The matplotlib axes containing the plot.

Return type:

ax

plenoptic.synthesize.metamer.plot_pixel_values(metamer, batch_idx=0, channel_idx=None, iteration=None, ylim=False, ax=None, **kwargs)[source]

Plot histogram of pixel values of target image and its metamer.

As a way to check the distributions of pixel intensities and see if there’s any values outside the allowed range

Parameters:
  • metamer (Metamer) – Metamer object with the images whose pixel values we want to compare.

  • batch_idx (int) – Which index to take from the batch dimension

  • channel_idx (Optional[int]) – Which index to take from the channel dimension. If None, we use all channels (assumed use-case is RGB(A) images).

  • iteration (Optional[int]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.

  • ylim (Union[Tuple[float, float], Literal[False]]) – if tuple, the ylimit to set for this axis. If False, we leave it untouched

  • ax (Optional[Axes]) – Pre-existing axes for plot. If None, we call plt.gca().

  • kwargs – passed to plt.hist

Returns:

Created axes.

Return type:

ax

plenoptic.synthesize.metamer.plot_representation_error(metamer, batch_idx=0, iteration=None, ylim=None, ax=None, as_rgb=False, **kwargs)[source]

Plot distance ratio showing how close we are to convergence.

We plot _representation_error(metamer, iteration). For more details, see plenoptic.tools.display.plot_representation.

Parameters:
  • metamer (Metamer) – Metamer object whose synthesized metamer we want to display.

  • batch_idx (int) – Which index to take from the batch dimension

  • iteration (Optional[int]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.

  • ylim (Union[Tuple[float, float], None, Literal[False]]) – If ylim is None, we sets the axes’ y-limits to be (-y_max, y_max), where y_max=np.abs(data).max(). If it’s False, we do nothing. If a tuple, we use that range.

  • ax (Optional[Axes]) – Pre-existing axes for plot. If None, we call plt.gca().

  • as_rgb (bool, optional) – The representation can be image-like with multiple channels, and we have no way to determine whether it should be represented as an RGB image or not, so the user must set this flag to tell us. It will be ignored if the response doesn’t look image-like or if the model has its own plot_representation_error() method. Else, it will be passed to po.imshow(), see that methods docstring for details.

  • kwargs – Passed to metamer.model.forward

Returns:

List of created axes

Return type:

axes

plenoptic.synthesize.metamer.plot_synthesis_status(metamer, batch_idx=0, channel_idx=None, iteration=None, ylim=None, vrange='indep1', zoom=None, plot_representation_error_as_rgb=False, fig=None, axes_idx={}, figsize=None, included_plots=['display_metamer', 'plot_loss', 'plot_representation_error'], width_ratios={})[source]

Make a plot showing synthesis status.

We create several subplots to analyze this. By default, we create three subplots on a new figure: the first one contains the synthesized metamer, the second contains the loss, and the third contains the representation error.

There is an optional additional plot: plot_pixel_values, a histogram of pixel values of the metamer and target image.

The plots to include are specified by including their name in the included_plots list. All plots can be created separately using the method with the same name.

Parameters:
  • metamer (Metamer) – Metamer object whose status we want to plot.

  • batch_idx (int) – Which index to take from the batch dimension

  • channel_idx (Optional[int]) – Which index to take from the channel dimension. If None, we use all channels (assumed use-case is RGB(A) image).

  • iteration (Optional[int]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.

  • ylim (Union[Tuple[float, float], None, Literal[False]]) – The ylimit to use for the representation_error plot. We pass this value directly to plot_representation_error

  • vrange (Union[Tuple[float, float], str]) – The vrange option to pass to display_metamer(). See docstring of imshow for possible values.

  • zoom (Optional[float]) – How much to zoom in / enlarge the metamer, the ratio of display pixels to image pixels. If None (the default), we attempt to find the best value ourselves.

  • plot_representation_error_as_rgb (bool, optional) – The representation can be image-like with multiple channels, and we have no way to determine whether it should be represented as an RGB image or not, so the user must set this flag to tell us. It will be ignored if the response doesn’t look image-like or if the model has its own plot_representation_error() method. Else, it will be passed to po.imshow(), see that methods docstring for details.

  • fig (Optional[Figure]) – if None, we create a new figure. otherwise we assume this is an empty figure that has the appropriate size and number of subplots

  • axes_idx (Dict[str, int]) – Dictionary specifying which axes contains which type of plot, allows for more fine-grained control of the resulting figure. Probably only helpful if fig is also defined. Possible keys: 'display_metamer', 'plot_loss', 'plot_representation_error', 'plot_pixel_values', 'misc'. Values should all be ints. If you tell this function to create a plot that doesn’t have a corresponding key, we find the lowest int that is not already in the dict, so if you have axes that you want unchanged, place their idx in 'misc'.

  • figsize (Optional[Tuple[float, float]]) – The size of the figure to create. It may take a little bit of playing around to find a reasonable value. If None, we attempt to make our best guess, aiming to have each axis be of size (5, 5)

  • included_plots (List[str]) – Which plots to include. Must be some subset of 'display_metamer', 'plot_loss', 'plot_representation_error', 'plot_pixel_values'.

  • width_ratios (Dict[str, float]) – By default, all plots axes will have the same width. To change that, specify their relative widths using the keys: 'display_metamer', 'plot_loss', 'plot_representation_error', 'plot_pixel_values' and floats specifying their relative width. Any not included will be assumed to be 1.

Return type:

Tuple[Figure, Dict[str, int]]

Returns:

  • fig – The figure containing this plot

  • axes_idx – Dictionary giving index of each plot.

plenoptic.synthesize.simple_metamer module

Simple Metamer Class

class plenoptic.synthesize.simple_metamer.SimpleMetamer(image, model)[source]

Bases: Synthesis

Simple version of metamer synthesis.

This doesn’t have any of the bells and whistles of the full Metamer class, but does perform basic metamer synthesis: given a target image and a model, synthesize a new image (initialized with uniform noise) that has the same model output.

This is meant as a demonstration of the basic logic of synthesis.

Parameters:
  • image (Tensor) – A 4d tensor, this is the image whose model representation we wish to match.

  • model (Module) – The visual model whose representation we wish to match.

Methods

load(file_path[, map_location])

Load all relevant attributes from a .pt file.

save(file_path)

Save all relevant (non-model) variables in .pt file.

synthesize([max_iter, optimizer])

Synthesize a simple metamer.

to(*args, **kwargs)

Move and/or cast the parameters and buffers.

load(file_path, map_location=None)[source]

Load all relevant attributes from a .pt file.

Note this operates in place and so doesn’t return anything.

Parameters:

file_path (str) – The path to load the synthesis object from

save(file_path)[source]

Save all relevant (non-model) variables in .pt file.

Parameters:

file_path (str) – The path to save the SimpleMetamer object to.

synthesize(max_iter=100, optimizer=None)[source]

Synthesize a simple metamer.

If called multiple times, will continue where we left off.

Parameters:
  • max_iter (int) – Number of iterations to run synthesis for.

  • optimizer (Optional[Optimizer]) – The optimizer to use. If None and this is the first time calling synthesize, we use Adam(lr=.01, amsgrad=True); if synthesize has been called before, we reuse the previous optimizer.

Returns:

The synthesized metamer

Return type:

metamer

to(*args, **kwargs)[source]

Move and/or cast the parameters and buffers.

This can be called as .. function:: to(device=None, dtype=None, non_blocking=False) .. function:: to(dtype, non_blocking=False) .. function:: to(tensor, non_blocking=False) Its signature is similar to torch.Tensor.to(), but only accepts floating point desired dtype s. In addition, this method will only cast the floating point parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices. When calling this method to move tensors to a CUDA device, items in attrs that start with “saved_” will not be moved. .. note:

This method modifies the module in-place.
Args:
device (torch.device): the desired device of the parameters

and buffers in this module

dtype (torch.dtype): the desired floating point type of

the floating point parameters and buffers in this module

tensor (torch.Tensor): Tensor whose dtype and device are the desired

dtype and device for all parameters and buffers in this module

attrs (list): list of strs containing the attributes of

this object to move to the specified device/dtype

Returns:

Module: self

plenoptic.synthesize.synthesis module

abstract synthesis super-class.

class plenoptic.synthesize.synthesis.OptimizedSynthesis(range_penalty_lambda=0.1, allowed_range=(0, 1))[source]

Bases: Synthesis

Abstract super-class for synthesis objects that use optimization.

The primary difference between this and the generic Synthesis class is that these will use an optimizer object to iteratively update their output.

Attributes:
allowed_range
gradient_norm

Synthesis gradient’s L2 norm over iterations.

losses

Synthesis loss over iterations.

optimizer
pixel_change_norm

L2 norm change in pixel values over iterations.

range_penalty_lambda
store_progress

Methods

load(file_path[, map_location, ...])

Load all relevant attributes from a .pt file.

objective_function()

How good is the current synthesized object.

save(file_path[, attrs])

Save all relevant (non-model) variables in .pt file.

synthesize()

Synthesize something.

to(*args[, attrs])

Moves and/or casts the parameters and buffers.

property allowed_range
property gradient_norm

Synthesis gradient’s L2 norm over iterations.

property losses

Synthesis loss over iterations.

abstract objective_function()[source]

How good is the current synthesized object.

See plenoptic.tools.optim for some examples.

property optimizer
property pixel_change_norm

L2 norm change in pixel values over iterations.

property range_penalty_lambda
property store_progress
class plenoptic.synthesize.synthesis.Synthesis[source]

Bases: ABC

Abstract super-class for synthesis objects.

All synthesis objects share a variety of similarities and thus need to have similar methods. Some of these can be implemented here and simply inherited, some of them will need to be different for each sub-class and thus are marked as abstract methods here

Methods

load(file_path[, map_location, ...])

Load all relevant attributes from a .pt file.

save(file_path[, attrs])

Save all relevant (non-model) variables in .pt file.

synthesize()

Synthesize something.

to(*args[, attrs])

Moves and/or casts the parameters and buffers.

load(file_path, map_location=None, check_attributes=[], check_loss_functions=[], **pickle_load_args)[source]

Load all relevant attributes from a .pt file.

This should be called by an initialized Synthesis object – we will ensure that the attributes in the check_attributes arg all match in the current and loaded object.

Note this operates in place and so doesn’t return anything.

Parameters:
  • file_path (str) – The path to load the synthesis object from

  • map_location (Optional[str]) – map_location argument to pass to torch.load. If you save stuff that was being run on a GPU and are loading onto a CPU, you’ll need this to make sure everything lines up properly. This should be structured like the str you would pass to torch.device

  • check_attributes (List[str]) – List of strings we ensure are identical in the current Synthesis object and the loaded one. Checking the model is generally not recommended, since it can be hard to do (checking callable objects is hard in Python) – instead, checking the base_representation should ensure the model hasn’t functinoally changed.

  • check_loss_functions (List[str]) – Names of attributes that are loss functions and so must be checked specially – loss functions are callables, and it’s very difficult to check python callables for equality so, to get around that, we instead call the two versions on the same pair of tensors, and compare the outputs.

  • pickle_load_args – any additional kwargs will be added to pickle_module.load via torch.load, see that function’s docstring for details.

save(file_path, attrs=None)[source]

Save all relevant (non-model) variables in .pt file.

If you leave attrs as None, we grab vars(self) and exclude ‘model’. This is probably correct, but the option is provided to override it just in case

Parameters:
  • file_path (str) – The path to save the synthesis object to

  • attrs (list or None, optional) – List of strs containing the names of the attributes of this object to save. See above for behavior if attrs is None.

abstract synthesize()[source]

Synthesize something.

abstract to(*args, attrs=[], **kwargs)[source]

Moves and/or casts the parameters and buffers. Similar to save, this is an abstract method only because you need to define the attributes to call to on.

This can be called as .. function:: to(device=None, dtype=None, non_blocking=False) .. function:: to(dtype, non_blocking=False) .. function:: to(tensor, non_blocking=False) Its signature is similar to torch.Tensor.to(), but only accepts floating point desired dtype s. In addition, this method will only cast the floating point parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices. When calling this method to move tensors to a CUDA device, items in attrs that start with “saved_” will not be moved. .. note:

This method modifies the module in-place.
Args:
device (torch.device): the desired device of the parameters

and buffers in this module

dtype (torch.dtype): the desired floating point type of

the floating point parameters and buffers in this module

tensor (torch.Tensor): Tensor whose dtype and device are the desired

dtype and device for all parameters and buffers in this module

attrs (list): list of strs containing the attributes of

this object to move to the specified device/dtype

Module contents
plenoptic.tools package
Submodules
plenoptic.tools.conv module
plenoptic.tools.conv.blur_downsample(x, n_scales=1, filtname='binom5', scale_filter=True)[source]

Correlate with a binomial coefficient filter and downsample by 2

Parameters:
  • x (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. Channels are treated in the same way as batches.

  • n_scales (int, optional. Should be non-negative.) – Apply the blur and downsample procedure recursively n_scales times. Default to 1.

  • filtname (str, optional) – Name of the filter. See pt.named_filter for options. Default to “binom5”.

  • scale_filter (bool, optional) – If true (default), the filter sums to 1 (ie. it does not affect the DC component of the signal). If false, the filter sums to 2.

plenoptic.tools.conv.correlate_downsample(image, filt, padding_mode='reflect')[source]

Correlate with a filter and downsample by 2

Parameters:
  • image (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. Channels are treated in the same way as batches.

  • filt (2-D torch.Tensor) – The filter to correlate with the input image

  • padding_mode (string, optional) – One of “constant”, “reflect”, “replicate”, “circular”. The option “constant” means padding with zeros.

plenoptic.tools.conv.same_padding(x, kernel_size, stride=(1, 1), dilation=(1, 1), pad_mode='circular')[source]

Pad a tensor so that 2D convolution will result in output with same dims.

Return type:

Tensor

plenoptic.tools.conv.upsample_blur(x, odd, filtname='binom5', scale_filter=True)[source]

Upsample by 2 and convolve with a binomial coefficient filter

Parameters:
  • x (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. Channels are treated in the same way as batches.

  • odd (tuple, list or numpy.ndarray) – This should contain two integers of value 0 or 1, which determines whether the output height and width should be even (0) or odd (1).

  • filtname (str, optional) – Name of the filter. See pt.named_filter for options. Default to “binom5”.

  • scale_filter (bool, optional) – If true (default), the filter sums to 4 (ie. it multiplies the signal by 4 before the blurring operation). If false, the filter sums to 2.

plenoptic.tools.conv.upsample_convolve(image, odd, filt, padding_mode='reflect')[source]

Upsample by 2 and convolve with a filter

Parameters:
  • image (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. Channels are treated in the same way as batches.

  • odd (tuple, list or numpy.ndarray) – This should contain two integers of value 0 or 1, which determines whether the output height and width should be even (0) or odd (1).

  • filt (2-D torch.Tensor) – The filter to convolve with the upsampled image

  • padding_mode (string, optional) – One of “constant”, “reflect”, “replicate”, “circular”. The option “constant” means padding with zeros.

plenoptic.tools.convergence module

Functions that check for optimization convergence/stabilization.

The functions herein generally differ in what they are checking for convergence: loss, pixel change, etc.

They should probably be able to accept the following arguments, in this order (they can accept more):

  • synth: an OptimizedSynthesis object to check.

  • stop_criterion: the value used as criterion / tolerance that our convergence target is compared against.

  • stop_iters_to_check: how many iterations back to check for convergence.

They must return a single bool: True if we’ve reached convergence, False if not.

plenoptic.tools.convergence.coarse_to_fine_enough(synth, i, ctf_iters_to_check)[source]

Check whether we’ve synthesized all scales and done so for at least ctf_iters_to_check iterations

This is meant to be paired with another convergence check, such as loss_convergence.

Parameters:
  • synth (Metamer) – The Metamer object to check.

  • i (int) – The current iteration (0-indexed).

  • ctf_iters_to_check (int) – Minimum number of iterations coarse-to-fine must run at each scale. If self.coarse_to_fine is False, then this is ignored.

Returns:

Whether we’ve been doing coarse to fine synthesis for long enough.

Return type:

ctf_enough

plenoptic.tools.convergence.loss_convergence(synth, stop_criterion, stop_iters_to_check)[source]

Check whether the loss has stabilized and, if so, return True.

Have we been synthesizing for stop_iters_to_check iterations? | |

no yes
‘—->Is abs(synth.loss[-1] - synth.losses[-stop_iters_to_check]) < stop_criterion?
no |
| yes

<——-’ | | ‘——> return True | ‘———> return False

Parameters:
  • synth (OptimizedSynthesis) – The OptimizedSynthesis object to check.

  • stop_criterion (float) – If the loss over the past stop_iters_to_check has changed less than stop_criterion, we terminate synthesis.

  • stop_iters_to_check (int) – How many iterations back to check in order to see if the loss has stopped decreasing (for stop_criterion).

Returns:

Whether the loss has stabilized or not.

Return type:

loss_stabilized

plenoptic.tools.convergence.pixel_change_convergence(synth, stop_criterion, stop_iters_to_check)[source]

Check whether the pixel change norm has stabilized and, if so, return True.

Have we been synthesizing for stop_iters_to_check iterations? | |

no yes
‘—->Is (synth.pixel_change_norm[-stop_iters_to_check:] < stop_criterion).all()?
no |
| yes

<——-’ | | ‘——> return True | ‘———> return False

Parameters:
  • synth (OptimizedSynthesis) – The OptimizedSynthesis object to check.

  • stop_criterion (float) – If the pixel change norm has been less than stop_criterion for all of the past stop_iters_to_check, we terminate synthesis.

  • stop_iters_to_check (int) – How many iterations back to check in order to see if the pixel change norm has stopped decreasing (for stop_criterion).

Returns:

Whether the pixel change norm has stabilized or not.

Return type:

loss_stabilized

plenoptic.tools.data module
plenoptic.tools.data.convert_float_to_int(im, dtype=<class 'numpy.uint8'>)[source]

Convert image from float to 8 or 16 bit image

We work with float images that lie between 0 and 1, but for saving them (either as png or in a numpy array), we want to convert them to 8 or 16 bit integers. This function does that by multiplying it by the max value for the target dtype (255 for 8 bit 65535 for 16 bit) and then converting it to the proper type.

We’ll raise an exception if the max is higher than 1, in which case we have no idea what to do.

Parameters:
  • im (ndarray) – The image to convert

  • dtype – The target data type. {np.uint8, np.uint16}

Returns:

The converted image, now with dtype=dtype

Return type:

im

plenoptic.tools.data.load_images(paths, as_gray=True)[source]

Correctly load in images

Our models and synthesis methods expect their inputs to be 4d float32 images: (batch, channel, height, width), where the batch dimension contains multiple images and channel contains something like RGB or color channel. This function helps you get your inputs into that format. It accepts either a single file, a list of files, or a single directory containing images, will load them in, normalize them to lie between 0 and 1, convert them to float32, optionally convert them to grayscale, make them tensors, and get them into the right shape.

Parameters:
  • paths (Union[str, List[str]]) – A str or list of strs. If a list, must contain paths of image files. If a str, can either be the path of a single image file or of a single directory. If a directory, we try to load every file it contains (using imageio.imwrite) and skip those we cannot (thus, for efficiency you should not point this to a directory with lots of non-image files). This is NOT recursive.

  • as_gray (bool) – Whether to convert the images into grayscale or not after loading them. If False, we do nothing. If True, we call skimage.color.rgb2gray on them.

Returns:

4d tensor containing the images.

Return type:

images

plenoptic.tools.data.make_synthetic_stimuli(size=256, requires_grad=True)[source]

Make a set of basic stimuli, useful for developping and debugging models

Parameters:
  • size (int) – The stimuli will have torch.Size([size, size]).

  • requires_grad (bool) – Whether to initialize the simuli with gradients.

Returns:

Tensor of shape [11, 1, size, size]. The set of basic stiuli: [impulse, step_edge, ramp, bar, curv_edge, sine_grating, square_grating, polar_angle, angular_sine, zone_plate, fractal]

Return type:

stimuli

plenoptic.tools.data.polar_angle(size, phase=0.0, origin=None, device=None)[source]

Make polar angle matrix (in radians).

Compute a matrix of given size containing samples of the polar angle (in radians, CW from the X-axis, ranging from -pi to pi), relative to given phase, about the given origin pixel.

Parameters:
  • size (Union[int, Tuple[int, int]]) – If an int, we assume the image should be of dimensions (size, size). if a tuple, must be a 2-tuple of ints specifying the dimensions

  • phase (float) – The phase of the polar angle function (in radians, clockwise from the X-axis)

  • origin (Union[int, Tuple[float, float], None]) – The center of the image. if an int, we assume the origin is at (origin, origin). if a tuple, must be a 2-tuple of ints specifying the origin (where (0, 0) is the upper left). if None, we assume the origin lies at the center of the matrix, (size+1)/2.

  • device (Optional[device]) – The device to create this tensor on.

Returns:

The polar angle matrix

Return type:

res

plenoptic.tools.data.polar_radius(size, exponent=1.0, origin=None, device=None)[source]

Make distance-from-origin (r) matrix

Compute a matrix of given size containing samples of a radial ramp function, raised to given exponent, centered at given origin.

Parameters:
  • size (Union[int, Tuple[int, int]]) – If an int, we assume the image should be of dimensions (size, size). if a tuple, must be a 2-tuple of ints specifying the dimensions.

  • exponent (float) – The exponent of the radial ramp function.

  • origin (Union[int, Tuple[int, int], None]) – The center of the image. if an int, we assume the origin is at (origin, origin). if a tuple, must be a 2-tuple of ints specifying the origin (where (0, 0) is the upper left). if None, we assume the origin lies at the center of the matrix, (size+1)/2.

  • device (Union[str, device, None]) – The device to create this tensor on.

Returns:

The polar radius matrix.

Return type:

res

plenoptic.tools.data.to_numpy(x, squeeze=False)[source]

cast tensor to numpy in the most conservative way possible

Parameters:
  • x (Union[Tensor, ndarray]) – Tensor to be converted to numpy.ndarray on CPU.

  • squeeze (bool) – Removes all dummy dimensions of the tensor

Return type:

Converted tensor as numpy.ndarray on CPU.

plenoptic.tools.display module

various helpful utilities for plotting or displaying information

plenoptic.tools.display.animshow(video, framerate=2.0, repeat=False, vrange='indep1', zoom=1, title='', col_wrap=None, ax=None, cmap=None, plot_complex='rectangular', batch_idx=None, channel_idx=None, as_rgb=False, **kwargs)[source]

Animate video(s) correctly.

This function animates videos correctly, making sure that each element in the tensor corresponds to a pixel or an integer number of pixels, to avoid aliasing (NOTE: this guarantee only holds for the saved animation (assuming video compression doesn’t interfere); it should generally hold in notebooks as well, but will fail if, e.g., your video is 2000 pixels wide on an monitor 1000 pixels wide; the notebook handles the rescaling in a way we can’t control).

This functions returns the matplotlib FuncAnimation object. In order to view it in a Jupyter notebook, use the plenoptic.convert_anim_to_html(anim) function. In order to save, use anim.save(filename) (note for this that you’ll need the appropriate writer installed and on your path, e.g., ffmpeg, imagemagick, etc).

Parameters:
  • video (torch.Tensor or list) – The videos to display. Tensors should be 5d (batch, channel, time, height, width). List of tensors should be used for tensors of different height and width: all videos will automatically be rescaled so they’re displayed at the same height and width, thus, their heights and widths must be scalar multiples of each other. Videos must all have the same number of frames as well.

  • framerate (float) – Temporal resolution of the video, in Hz (frames per second).

  • repeat (bool) – whether to loop the animation or just play it once

  • vrange (tuple or str) –

    If a 2-tuple, specifies the image values vmin/vmax that are mapped to the minimum and maximum value of the colormap, respectively. If a string:

    • ’auto0’: all images have same vmin/vmax, which have the same absolute

      value, and come from the minimum or maximum across all images, whichever has the larger absolute value

    • ’auto/auto1’: all images have same vmin/vmax, which are the

      minimum/maximum values across all images

    • ’auto2’: all images have same vmin/vmax, which are the mean (across

      all images) minus/ plus 2 std dev (across all images)

    • ’auto3’: all images have same vmin/vmax, chosen so as to map the

      10th/90th percentile values to the 10th/90th percentile of the display intensity range. For example: vmin is the 10th percentile image value minus 1/8 times the difference between the 90th and 10th percentile

    • ’indep0’: each image has an independent vmin/vmax, which have the

      same absolute value, which comes from either their minimum or maximum value, whichever has the larger absolute value.

    • ’indep1’: each image has an independent vmin/vmax, which are their

      minimum/maximum values

    • ’indep2’: each image has an independent vmin/vmax, which is their

      mean minus/plus 2 std dev

    • ’indep3’: each image has an independent vmin/vmax, chosen so that

      the 10th/90th percentile values map to the 10th/90th percentile intensities.

  • zoom (float) – ratio of display pixels to image pixels. if >1, must be an integer. If <1, must be 1/d where d is a a divisor of the size of the largest image.

  • title (str, list, or None, optional) –

    Title for the plot. In addition to the specified title, we add a subtitle giving the plotted range and dimensionality (with zoom) * if str, will put the same title on every plot. * if list, all values must be str, must be the same length as img,

    assigning each title to corresponding image.

    • if None, no title will be printed (and subtitle will be removed).

  • col_wrap (int or None, optional) – number of axes to have in each row. If None, will fit all axes in a single row.

  • ax (matplotlib.pyplot.axis or None, optional) – if None, we make the appropriate figure. otherwise, we resize the axes so that it’s the appropriate number of pixels (done by shrinking the bbox - if the bbox is already too small, this will throw an Exception!, so first define a large enough figure using either pyrtools.make_figure or plt.figure)

  • cmap (matplotlib colormap, optional) – colormap to use when showing these images

  • plot_complex ({'rectangular', 'polar', 'logpolar'}) –

    specifies handling of complex values.

    • ’rectangular’: plot real and imaginary components as separate images

    • ’polar’: plot amplitude and phase as separate images

    • ’logpolar’: plot log_2 amplitude and phase as separate images

    for any other value, we raise a warning and default to rectangular.

  • batch_idx (int or None, optional) – Which element from the batch dimension to plot. If None, we plot all.

  • channel_idx (int or None, optional) – Which element from the channel dimension to plot. If None, we plot all. Note if this is an int, then as_rgb=True will fail, because we restrict the channels.

  • as_rgb (bool, optional) – Whether to consider the channels as encoding RGB(A) values. If True, we attempt to plot the image in color, so your tensor must have 3 (or 4 if you want the alpha channel) elements in the channel dimension, or this will raise an Exception. If False, we plot each channel as a separate grayscale image.

  • kwargs – Passed to ax.imshow

Returns:

anim – The animation object. In order to view, must convert to HTML or save.

Return type:

matplotlib.animation.FuncAnimation

Notes

By default, we use the ffmpeg backend, which requires that you have ffmpeg installed and on your path (https://ffmpeg.org/download.html). To use a different, use the matplotlib rcParams: matplotlib.rcParams[‘animation.writer’] = writer, see https://matplotlib.org/stable/api/animation_api.html#writer-classes for more details.

For displaying in a jupyter notebook, ffmpeg appears to be required.

plenoptic.tools.display.clean_stem_plot(data, ax=None, title='', ylim=None, xvals=None, **kwargs)[source]

convenience wrapper for plotting stem plots

This plots the data, baseline, cleans up the axis, and sets the title

Should not be called by users directly, but is a helper function for the various plot_representation() functions

By default, stem plot would have a baseline that covers the entire range of the data. We want to be able to break that up visually (so there’s a line from 0 to 9, from 10 to 19, etc), and passing xvals separately allows us to do that. If you want the default stem plot behavior, leave xvals as None.

Parameters:
  • data (np.ndarray) – The data to plot (as a stem plot)

  • ax (matplotlib.pyplot.axis or None, optional) – The axis to plot the data on. If None, we plot on the current axis

  • title (str or None, optional) – The title to put on the axis if not None. If None, we don’t call ax.set_title (useful if you want to avoid changing the title on an existing plot)

  • ylim (tuple or None, optional) – If not None, the y-limits to use for this plot. If None, we use the default, slightly adjusted so that the minimum is 0. If False, do not change y-limits.

  • xvals (tuple or None, optional) – A 2-tuple of lists, containing the start (xvals[0]) and stop (xvals[1]) x values for plotting. If None, we use the default stem plot behavior.

  • kwargs – passed to ax.stem

Returns:

ax – The axis with the plot

Return type:

matplotlib.pyplot.axis

Examples

We allow for breaks in the baseline value if we want to visually break up the plot, as we see below.

..plot::
include-source:

import plenoptic as po import numpy as np import matplotlib.pyplot as plt # if ylim=None, as in this example, the minimum y-valuewill get # set to 0, so we want to make sure our values are all positive y = np.abs(np.random.randn(55)) y[15:20] = np.nan y[35:40] = np.nan # we want to draw the baseline from 0 to 14, 20 to 34, and 40 to # 54, everywhere that we have non-NaN values for y xvals = ([0, 20, 40], [14, 34, 54]) po.tools.display.clean_stem_plot(y, xvals=xvals) plt.show()

If we don’t care about breaking up the x-axis, you can simply use the default xvals (None). In this case, this function will just clean up the plot a little bit

..plot::
include-source:

import plenoptic as po import numpy as np import matplotlib.pyplot as plt # if ylim=None, as in this example, the minimum y-valuewill get # set to 0, so we want to make sure our values are all positive y = np.abs(np.random.randn(55)) po.tools.display.clean_stem_plot(y) plt.show()

plenoptic.tools.display.clean_up_axes(ax, ylim=None, spines_to_remove=['top', 'right', 'bottom'], axes_to_remove=['x'])[source]

Clean up an axis, as desired when making a stem plot of the representation

Parameters:
  • ax (matplotlib.pyplot.axis) – The axis to clean up.

  • ylim (tuple, False, or None) – If a tuple, the y-limits to use for this plot. If None, we use the default, slightly adjusted so that the minimum is 0. If False, we do nothing.

  • spines_to_remove (list) – Some combination of ‘top’, ‘right’, ‘bottom’, and ‘left’. The spines we remove from the axis.

  • axes_to_remove (list) – Some combination of ‘x’, ‘y’. The axes to set as invisible.

Returns:

ax – The cleaned-up axis

Return type:

matplotlib.pyplot.axis

plenoptic.tools.display.convert_anim_to_html(anim)[source]

convert a matplotlib animation object to HTML (for display)

This is a simple little wrapper function that allows the animation to be displayed in a Jupyter notebook

Parameters:

anim (matplotlib.animation.FuncAnimation) – The animation object to convert to HTML

plenoptic.tools.display.imshow(image, vrange='indep1', zoom=None, title='', col_wrap=None, ax=None, cmap=None, plot_complex='rectangular', batch_idx=None, channel_idx=None, as_rgb=False, **kwargs)[source]

Show image(s) correctly.

This function shows images correctly, making sure that each element in the tensor corresponds to a pixel or an integer number of pixels, to avoid aliasing (NOTE: this guarantee only holds for the saved image; it should generally hold in notebooks as well, but will fail if, e.g., you plot an image that’s 2000 pixels wide on an monitor 1000 pixels wide; the notebook handles the rescaling in a way we can’t control).

Parameters:
  • image (torch.Tensor or list) – The images to display. Tensors should be 4d (batch, channel, height, width). List of tensors should be used for tensors of different height and width: all images will automatically be rescaled so they’re displayed at the same height and width, thus, their heights and widths must be scalar multiples of each other.

  • vrange (tuple or str) –

    If a 2-tuple, specifies the image values vmin/vmax that are mapped to the minimum and maximum value of the colormap, respectively. If a string:

    • ’auto0’: all images have same vmin/vmax, which have the same absolute

      value, and come from the minimum or maximum across all images, whichever has the larger absolute value

    • ’auto/auto1’: all images have same vmin/vmax, which are the

      minimum/maximum values across all images

    • ’auto2’: all images have same vmin/vmax, which are the mean (across

      all images) minus/ plus 2 std dev (across all images)

    • ’auto3’: all images have same vmin/vmax, chosen so as to map the

      10th/90th percentile values to the 10th/90th percentile of the display intensity range. For example: vmin is the 10th percentile image value minus 1/8 times the difference between the 90th and 10th percentile

    • ’indep0’: each image has an independent vmin/vmax, which have the

      same absolute value, which comes from either their minimum or maximum value, whichever has the larger absolute value.

    • ’indep1’: each image has an independent vmin/vmax, which are their

      minimum/maximum values

    • ’indep2’: each image has an independent vmin/vmax, which is their

      mean minus/plus 2 std dev

    • ’indep3’: each image has an independent vmin/vmax, chosen so that

      the 10th/90th percentile values map to the 10th/90th percentile intensities.

  • zoom (float or None) – ratio of display pixels to image pixels. if >1, must be an integer. If <1, must be 1/d where d is a a divisor of the size of the largest image. If None, we try to determine the best zoom.

  • title (str, list, or None, optional) –

    Title for the plot. In addition to the specified title, we add a subtitle giving the plotted range and dimensionality (with zoom) * if str, will put the same title on every plot. * if list, all values must be str, must be the same length as img,

    assigning each title to corresponding image.

    • if None, no title will be printed (and subtitle will be removed).

  • col_wrap (int or None, optional) – number of axes to have in each row. If None, will fit all axes in a single row.

  • ax (matplotlib.pyplot.axis or None, optional) – if None, we make the appropriate figure. otherwise, we resize the axes so that it’s the appropriate number of pixels (done by shrinking the bbox - if the bbox is already too small, this will throw an Exception!, so first define a large enough figure using either make_figure or plt.figure)

  • cmap (matplotlib colormap, optional) – colormap to use when showing these images

  • plot_complex ({'rectangular', 'polar', 'logpolar'}) –

    specifies handling of complex values.

    • ’rectangular’: plot real and imaginary components as separate images

    • ’polar’: plot amplitude and phase as separate images

    • ’logpolar’: plot log_2 amplitude and phase as separate images

    for any other value, we raise a warning and default to rectangular.

  • batch_idx (int or None, optional) – Which element from the batch dimension to plot. If None, we plot all.

  • channel_idx (int or None, optional) – Which element from the channel dimension to plot. If None, we plot all. Note if this is an int, then as_rgb=True will fail, because we restrict the channels.

  • as_rgb (bool, optional) – Whether to consider the channels as encoding RGB(A) values. If True, we attempt to plot the image in color, so your tensor must have 3 (or 4 if you want the alpha channel) elements in the channel dimension, or this will raise an Exception. If False, we plot each channel as a separate grayscale image.

  • kwargs – Passed to ax.imshow

Returns:

fig – figure containing the plotted images

Return type:

PyrFigure

plenoptic.tools.display.plot_representation(model=None, data=None, ax=None, figsize=(5, 5), ylim=False, batch_idx=0, title='', as_rgb=False)[source]

Helper function for plotting model representation

We are trying to plot data on ax, using model.plot_representation method, if it has it, and otherwise default to a function that makes sense based on the shape of data.

All of these arguments are optional, but at least some of them need to be set:

  • If model is None, we fall-back to a type of plot based on the shape of data. If it looks image-like, we’ll use plenoptic.imshow and if it looks vector-like, we’ll use plenoptic.clean_stem_plot. If it’s a dictionary, we’ll assume each key, value pair gives the title and data to plot on a separate sub-plot.

  • If data is None, we can only do something if model.plot_representation has some default behavior when data=None; this is probably to plot its own representation attribute. Thus, this will raise an Exception if both model and data are None, because we have no idea what to plot then.

  • If ax is None, we create a one-subplot figure using figsize. If ax is not None, we therefore ignore figsize.

  • If ylim is None, we call rescale_ylim, which sets the axes’ y-limits to be (-y_max, y_max), where y_max=np.abs(data).max(). If it’s False, we do nothing.

Parameters:
  • model (torch.nn.Module or None, optional) – A differentiable model that tells us how to plot data. See above for behavior if None.

  • data (array_like, dict, or None, optional) – The data to plot. See above for behavior if None.

  • ax (matplotlib.pyplot.axis or None, optional) – The axis to plot on. See above for behavior if None.

  • figsize (tuple, optional) – The size of the figure to create. Ignored if ax is not None.

  • ylim (tuple, None, or False, optional) – If not None, the y-limits to use for this plot. See above for behavior if None. If False, we do nothing.

  • batch_idx (int, optional) – Which index to take from the batch dimension

  • title (str, optional) – The title to put above this axis. If you want no title, pass the empty string ('')

  • as_rgb (bool, optional) – The representation can be image-like with multiple channels, and we have no way to determine whether it should be represented as an RGB image or not, so the user must set this flag to tell us. It will be ignored if the representation doesn’t look image-like or if the model has its own plot_representation_error() method. Else, it will be passed to po.imshow(), see that methods docstring for details.

Returns:

axes – List of created axes.

Return type:

list

plenoptic.tools.display.pyrshow(pyr_coeffs, vrange='indep1', zoom=1, show_residuals=True, cmap=None, plot_complex='rectangular', batch_idx=0, channel_idx=0, **kwargs)[source]

Display steerable pyramid coefficients in orderly fashion.

This function uses imshow to show the coefficients of the steeable pyramid, such that each scale shows up on a single row, with each scale in a given column.

Note that unlike imshow, we can only show one batch or channel at a time

Parameters:
  • pyr_coeffs (dict) – pyramid coefficients in the standard dictionary format as returned by SteerablePyramidFreq.forward()

  • vrange (tuple or str) –

    If a 2-tuple, specifies the image values vmin/vmax that are mapped to the minimum and maximum value of the colormap, respectively. If a string:

    • ’auto0’: all images have same vmin/vmax, which have the same absolute

      value, and come from the minimum or maximum across all images, whichever has the larger absolute value

    • ’auto/auto1’: all images have same vmin/vmax, which are the

      minimum/maximum values across all images

    • ’auto2’: all images have same vmin/vmax, which are the mean (across

      all images) minus/ plus 2 std dev (across all images)

    • ’auto3’: all images have same vmin/vmax, chosen so as to map the

      10th/90th percentile values to the 10th/90th percentile of the display intensity range. For example: vmin is the 10th percentile image value minus 1/8 times the difference between the 90th and 10th percentile

    • ’indep0’: each image has an independent vmin/vmax, which have the

      same absolute value, which comes from either their minimum or maximum value, whichever has the larger absolute value.

    • ’indep1’: each image has an independent vmin/vmax, which are their

      minimum/maximum values

    • ’indep2’: each image has an independent vmin/vmax, which is their

      mean minus/plus 2 std dev

    • ’indep3’: each image has an independent vmin/vmax, chosen so that

      the 10th/90th percentile values map to the 10th/90th percentile intensities.

  • zoom (float) – ratio of display pixels to image pixels. if >1, must be an integer. If <1, must be 1/d where d is a a divisor of the size of the largest image.

  • show_residuals (bool) – whether to display the residual bands (lowpass, highpass depending on the pyramid type)

  • cmap (matplotlib colormap, optional) – colormap to use when showing these images

  • plot_complex ({'rectangular', 'polar', 'logpolar'}) –

    specifies handling of complex values.

    • ’rectangular’: plot real and imaginary components as separate images

    • ’polar’: plot amplitude and phase as separate images

    • ’logpolar’: plot log_2 amplitude and phase as separate images

    for any other value, we raise a warning and default to rectangular.

  • batch_idx (int, optional) – Which element from the batch dimension to plot.

  • channel_idx (int, optional) – Which element from the channel dimension to plot.

  • kwargs – Passed on to pyrtools.pyrshow

Returns:

fig – the figure displaying the coefficients.

Return type:

PyrFigure

plenoptic.tools.display.rescale_ylim(axes, data)[source]

rescale y-limits nicely

We take the axes and set their limits to be (-y_max, y_max), where y_max=np.abs(data).max()

Parameters:
  • axes (list) – A list of matplotlib axes to rescale

  • data (array_like or dict) – The data to use when rescaling (or a dictiontary of those values)

plenoptic.tools.display.update_plot(axes, data, model=None, batch_idx=0)[source]

Update the information in some axes.

This is used for creating an animation over time. In order to create the animation, we need to know how to update the matplotlib Artists, and this provides a simple way of doing that. It assumes the plot has been created by something like plot_representation, which initializes all the artists.

We can update stem plots, lines (as returned by plt.plot), scatter plots, or images (RGB, RGBA, or grayscale).

There are two modes for this:

  • single axis: axes is a single axis, which may contain multiple artists (all of the same type) to update. data should be a Tensor with multiple channels (one per artist in the same order) or be a dictionary whose keys give the label(s) of the corresponding artist(s) and whose values are Tensors.

  • multiple axes: axes is a list of axes, each of which contains a single artist to update (artists can be different types). data should be a Tensor with multiple channels (one per axis in the same order) or a dictionary with the same number of keys as axes, which we can iterate through in order, and whose values are Tensors.

In all cases, data Tensors should be 3d (if the plot we’re updating is a line or stem plot) or 4d (if it’s an image or scatter plot).

RGB(A) images are special, since we store that info along the channel dimension, so they only work with single-axis mode (which will only have a single artist, because that’s how imshow works).

If you have multiple axes, each with multiple artists you want to update, that’s too complicated for us, and so you should write a model.update_plot() function which handles that.

If model is set, we try to call model.update_plot() (which must also return artists). If model doesn’t have an update_plot method, then we try to figure out how to update the axes ourselves, based on the shape of the data.

Parameters:
  • axes (list or matplotlib.pyplot.axis) – The axis or list of axes to update. We assume that these are the axes created by plot_representation and so contain stem plots in the correct order.

  • data (torch.Tensor or dict) – The new data to plot.

  • model (torch.nn.Module or None, optional) – A differentiable model that tells us how to plot data. See above for behavior if None.

  • batch_idx (int, optional) – Which index to take from the batch dimension

Returns:

artists – A list of the artists used to update the information on the plots

Return type:

list

plenoptic.tools.display.update_stem(stem_container, ydata)[source]

Update the information in a stem plot

We update the information in a single stem plot to match that given by ydata. We update the position of the markers and and the lines connecting them to the baseline, but we don’t change the baseline at all and assume that the xdata shouldn’t change at all.

Parameters:
  • stem_container (matplotlib.container.StemContainer) – Single container for the artists created in a plt.stem plot. It can be treated like a namedtuple (markerline, stemlines, baseline). In order to get this from an axis ax, try ax.containers[0] (obviously if you have more than one container in that axis, it may not be the first one).

  • ydata (array_like) – The new y-data to show on the plot. Importantly, must be the same length as the existing y-data.

Returns:

stem_container – The StemContainer containing the updated artists.

Return type:

matplotlib.container.StemContainer

plenoptic.tools.external module

tools to deal with data from outside plenoptic

For example, pre-existing synthesized images

plenoptic.tools.external.plot_MAD_results(original_image, noise_levels=None, results_dir=None, ssim_images_dir=None, zoom=3, vrange='indep1', **kwargs)[source]

plot original MAD results, provided by Zhou Wang

Plot the results of original MAD Competition, as provided in .mat files. The figure created shows the results for one reference image and multiple noise levels. The reference image is plotted on the first row, followed by a separate row for each noise level, which will show the initial (noisy) image and the four synthesized images, with their respective losses for the two metrics (MSE and SSIM).

We also return a DataFrame that contains the losses, noise levels, and original image name for each plotted noise level.

This code can probably be adapted to other uses, but requires that all images are the same size and assumes they’re all 64 x 64 pixels.

Parameters:
  • original_image ({samp1, samp2, samp3, samp4, samp5, samp6, samp7,) – samp8, samp9, samp10} which of the sample images to plot

  • noise_levels (list or None, optional) – which noise levels to plot. if None, will plot all. If a list, elements must be 2**i where i is in [1, 10]

  • results_dir (None or str, optional) – path to the results directory containing the results.mat files. If None, we call po.data.fetch_data to download (requires optional dependency pooch).

  • ssim_images_dir (None or str, optional) – path to the directory containing the .tif images used in SSIM paper. If None, we call po.data.fetch_data to download (requires optional dependency pooch).

  • zoom (int, optional) – amount to zoom each image, passed to pyrtools.imshow

  • vrange (str, optional) – in addition to the values accepted by pyrtools.imshow, we also accept ‘row0/1/2/3’, which is the same as ‘auto0/1/2/3’, except that we do it on a per-row basis (all images with same noise level)

  • kwargs – passed to pyrtools.imshow. Note that we call imshow separately on each image and so any argument that relies on imshow having access to all images will probably not work as expected

Returns:

  • fig (pyrtools.tools.display.Figure) – figure containing the images

  • results (dict) – dictionary containing the errors for each noise level. To convert to a well-structured pandas DataFrame, run pd.DataFrame(results).T

plenoptic.tools.optim module

Tools related to optimization such as more objective functions.

plenoptic.tools.optim.l2_norm(synth_rep, ref_rep, **kwargs)[source]

l2-norm of the difference between ref_rep and synth_rep

Parameters:
  • synth_rep (Tensor) – The first tensor to compare, model representation of the synthesized image.

  • ref_rep (Tensor) – The second tensor to compare, model representation of the reference image. must be same size as synth_rep.

  • kwargs – Ignored, only present to absorb extra arguments.

Returns:

The L2-norm of the difference between ref_rep and synth_rep.

Return type:

loss

plenoptic.tools.optim.mse(synth_rep, ref_rep, **kwargs)[source]

return the MSE between synth_rep and ref_rep

For two tensors, \(x\) and \(y\), with \(n\) values each:

\[MSE &= \frac{1}{n}\sum_i=1^n (x_i - y_i)^2\]

The two images must have a float dtype

Parameters:
  • synth_rep (Tensor) – The first tensor to compare, model representation of the synthesized image

  • ref_rep (Tensor) – The second tensor to compare, model representation of the reference image. must be same size as synth_rep,

  • kwargs – Ignored, only present to absorb extra arguments

Returns:

The mean-squared error between synth_rep and ref_rep

Return type:

loss

plenoptic.tools.optim.penalize_range(synth_img, allowed_range=(0.0, 1.0), **kwargs)[source]

penalize values outside of allowed_range

instead of clamping values to exactly fall in a range, this provides a ‘softer’ way of doing it, by imposing a quadratic penalty on any values outside the allowed_range. All values within the allowed_range have a penalty of 0

Parameters:
  • synth_img (Tensor) – The tensor to penalize. the synthesized image.

  • allowed_range (Tuple[float, float]) – 2-tuple of values giving the (min, max) allowed values

  • kwargs – Ignored, only present to absorb extra arguments

Returns:

Penalty for values outside range

Return type:

penalty

plenoptic.tools.optim.relative_MSE(synth_rep, ref_rep, **kwargs)[source]

Squared l2-norm of the difference between reference representation and synthesized representation relative to the squared l2-norm of the reference representation:

$$frac{||x - hat{x}||_2^2}{||x||_2^2}$$

Parameters:
  • synth_rep (Tensor) – The first tensor to compare, model representation of the synthesized image.

  • ref_rep (Tensor) – The second tensor to compare, model representation of the reference image. must be same size as synth_rep.

  • kwargs – Ignored, only present to absorb extra arguments

Returns:

Ratio of the squared l2-norm of the difference between ref_rep and synth_rep to the squared l2-norm of ref_rep

Return type:

loss

plenoptic.tools.optim.set_seed(seed=None)[source]

Set the seed.

We call both torch.manual_seed() and np.random.seed().

Parameters:

seed (Optional[int]) – The seed to set. If None, do nothing.

Return type:

None

plenoptic.tools.signal module
plenoptic.tools.signal.add_noise(img, noise_mse)[source]

Add normally distributed noise to an image

This adds normally-distributed noise to an image so that the resulting noisy version has the specified mean-squared error.

Parameters:
  • img (Tensor) – The image to make noisy.

  • noise_mse (Union[float, List[float]]) – The target MSE value / variance of the noise. More than one value is allowed.

Returns:

The noisy image. If noise_mse contains only one element, this will be the same size as img. Else, each separate value from noise_mse will be along the batch dimension.

Return type:

noisy_img

plenoptic.tools.signal.autocorrelation(x)[source]

Compute the autocorrelation of x.

Parameters:

x (Tensor) – N-dimensional tensor. We assume the last two dimension are height and width and compute you autocorrelation on these dimensions (independently on each other dimension).

Returns:

Autocorrelation of x

Return type:

ac

Notes

  • By the Einstein-Wiener-Khinchin theorem: The autocorrelation of a wide sense stationary (WSS) process is the inverse Fourier transform of its energy spectrum (ESD) - which itself is the multiplication between FT(x(t)) and FT(x(-t)). In other words, the auto-correlation is convolution of the signal x with itself, which corresponds to squaring in the frequency domain. This approach is computationally more efficient than brute force (n log(n) vs n^2).

  • By Cauchy-Swartz, the autocorrelation attains it is maximum at the center location (ie. no shift) - that maximum value is the signal’s variance (assuming that the input signal is mean centered).

plenoptic.tools.signal.center_crop(x, output_size)[source]

Crop out the center of a signal.

If x has an even number of elements on either of those final two dimensions, we round up.

Parameters:
  • x (Tensor) – N-dimensional tensor, we assume the last two dimensions are height and width.

  • output_size (int) – The size of the output. Note that we only support a single number, so both dimensions are cropped identically

Returns:

Tensor whose last two dimensions have each been cropped to output_size

Return type:

cropped

plenoptic.tools.signal.expand(x, factor)[source]

Expand a signal by a factor.

We do this in the frequency domain: pasting the Fourier contents of x in the center of a larger empty tensor, and then taking the inverse FFT.

Parameters:
  • x (Tensor) – The signal for expansion.

  • factor (float) – Factor by which to resize image. Must be larger than 1 and factor * x.shape[-2:] must give integer values

Returns:

The expanded signal

Return type:

expanded

See also

shrink

The inverse operation

plenoptic.tools.signal.interpolate1d(x_new, Y, X)[source]

One-dimensional linear interpolation.

Returns the one-dimensional piecewise linear interpolant to a function with given discrete data points (X, Y), evaluated at x_new.

Note: this function is just a wrapper around np.interp().

Parameters:
  • x_new (Tensor) – The x-coordinates at which to evaluate the interpolated values.

  • Y (Union[Tensor, ndarray]) – The y-coordinates of the data points.

  • X (Union[Tensor, ndarray]) – The x-coordinates of the data points, same length as X.

Return type:

Interpolated values of shape identical to x_new.

plenoptic.tools.signal.make_disk(img_size, outer_radius=None, inner_radius=None)[source]

Create a circular mask with softened edges to an image.

All values within inner_radius will be 1, and all values from inner_radius to outer_radius will decay smoothly to 0.

Parameters:
  • img_size (Union[int, Tuple[int, int], Size]) – Size of image in pixels.

  • outer_radius (Optional[float]) – Total radius of disk. Values from inner_radius to outer_radius will decay smoothly to zero.

  • inner_radius (Optional[float]) – Radius of inner disk. All elements from the origin to inner_radius will be set to 1.

Returns:

Tensor mask with torch.Size(img_size).

Return type:

mask

plenoptic.tools.signal.maximum(x, dim=None, keepdim=False)[source]

Compute maximum in torch over any dim or combination of axes in tensor.

Parameters:
  • x (Tensor) – Input tensor

  • dim (Optional[List[int]]) – Dimensions over which you would like to compute the minimum

  • keepdim (bool) – Keep original dimensions of tensor when returning result

Returns:

Maximum value of x.

Return type:

max_x

plenoptic.tools.signal.minimum(x, dim=None, keepdim=False)[source]

Compute minimum in torch over any axis or combination of axes in tensor.

Parameters:
  • x (Tensor) – Input tensor.

  • dim (Optional[List[int]]) – Dimensions over which you would like to compute the minimum.

  • keepdim (bool) – Keep original dimensions of tensor when returning result.

Returns:

Minimum value of x.

Return type:

min_x

plenoptic.tools.signal.modulate_phase(x, phase_factor=2.0)[source]

Modulate the phase of a complex signal.

Doubling the phase of a complex signal allows you to, for example, take the correlation between steerable pyramid coefficients at two adjacent spatial scales.

Parameters:
  • x (Tensor) – Complex tensor whose phase will be modulated.

  • phase_factor (float) – Multiplicative factor to change phase by.

Returns:

Phase-modulated complex tensor.

Return type:

x_mod

plenoptic.tools.signal.polar_to_rectangular(amplitude, phase)[source]

Polar to rectangular coordinate transform

Parameters:
  • amplitude (Tensor) – Tensor containing the amplitude (aka. complex modulus). Must be > 0.

  • phase (Tensor) – Tensor containing the phase

Return type:

Complex tensor.

plenoptic.tools.signal.raised_cosine(width=1, position=0, values=(0, 1))[source]

Return a lookup table containing a “raised cosine” soft threshold function.

Y = VALUES(1)
  • (VALUES(2)-VALUES(1))

  • cos^2( PI/2 * (X - POSITION + WIDTH)/WIDTH )

This lookup table is suitable for use by interpolate1d

Parameters:
  • width (float) – The width of the region over which the transition occurs.

  • position (float) – The location of the center of the threshold.

  • values (Tuple[float, float]) – 2-tuple specifying the values to the left and right of the transition.

Return type:

Tuple[ndarray, ndarray]

Returns:

  • X – The x values of this raised cosine.

  • Y – The y values of this raised cosine.

plenoptic.tools.signal.rectangular_to_polar(x)[source]

Rectangular to polar coordinate transform

Parameters:

x (Tensor) – Complex tensor.

Return type:

Tuple[Tensor, Tensor]

Returns:

  • amplitude – Tensor containing the amplitude (aka. complex modulus).

  • phase – Tensor containing the phase.

plenoptic.tools.signal.rescale(x, a=0.0, b=1.0)[source]

Linearly rescale the dynamic range of the input x to [a,b].

Return type:

Tensor

plenoptic.tools.signal.shrink(x, factor)[source]

Shrink a signal by a factor.

We do this in the frequency domain: cropping out the center of the Fourier transform of x, putting it in a new tensor, and taking the IFFT.

Parameters:
  • x (Tensor) – The signal for expansion.

  • factor (int) – Factor by which to resize image. Must be larger than 1 and factor / x.shape[-2:] must give integer values

Returns:

The expanded signal

Return type:

expanded

See also

expand

The inverse operation

plenoptic.tools.signal.steer(basis, angle, harmonics=None, steermtx=None, return_weights=False, even_phase=True)[source]

Steer BASIS to the specfied ANGLE.

Parameters:
  • basis (Tensor) – Array whose columns are vectorized rotated copies of a steerable function, or the responses of a set of steerable filters.

  • angle (Union[ndarray, Tensor, float]) – Scalar or column vector the size of the basis. specifies the angle(s) (in radians) to steer to

  • harmonics (Optional[List[int]]) – A list of harmonic numbers indicating the angular harmonic content of the basis. if None (default), N even or odd low frequencies, as for derivative filters

  • steermtx (Union[Tensor, ndarray, None]) – Matrix which maps the filters onto Fourier series components (ordered [cos0 cos1 sin1 cos2 sin2 … sinN]). See steer_to_harmonics_mtx function for more details. If None (default), assumes cosine phase harmonic components, and filter positions at 2pi*n/N.

  • return_weights (bool) – Whether to return the weights or not.

  • even_phase (bool) – Specifies whether the harmonics are cosine or sine phase aligned about those positions.

Returns:

  • res – The resteered basis.

  • steervect – The weights used to resteer the basis. only returned if return_weights is True.

plenoptic.tools.stats module
plenoptic.tools.stats.kurtosis(x, mean=None, var=None, dim=None, keepdim=False)[source]

sample estimate of x tailedness (presence of outliers)

kurtosis of univariate noral is 3.

smaller than 3: platykurtic (eg. uniform distribution)

greater than 3: leptokurtic (eg. Laplace distribution)

Parameters:
  • x (Tensor) – The input tensor.

  • mean (Union[float, Tensor, None]) – Reuse a precomputed mean.

  • var (Union[float, Tensor, None]) – Reuse a precomputed variance.

  • dim (Union[int, List[int], None]) – The dimension or dimensions to reduce.

  • keepdim (bool) – Whether the output tensor has dim retained or not.

Returns:

The kurtosis tensor.

Return type:

out

plenoptic.tools.stats.skew(x, mean=None, var=None, dim=None, keepdim=False)[source]

Sample estimate of x asymmetry about its mean

Parameters:
  • x (Tensor) – The input tensor

  • mean (Union[float, Tensor, None]) – Reuse a precomputed mean

  • var (Union[float, Tensor, None]) – Reuse a precomputed variance

  • dim (Union[int, List[int], None]) – The dimension or dimensions to reduce.

  • keepdim (bool) – Whether the output tensor has dim retained or not.

Returns:

The skewness tensor.

Return type:

out

plenoptic.tools.stats.variance(x, mean=None, dim=None, keepdim=False)[source]

Calculate sample variance.

Note that this is the uncorrected, or sample, variance, corresponding to torch.var(*, correction=0)

Parameters:
  • x (Tensor) – The input tensor

  • mean (Union[float, Tensor, None]) – Reuse a precomputed mean

  • dim (Union[int, List[int], None]) – The dimension or dimensions to reduce.

  • keepdim (bool) – Whether the output tensor has dim retained or not.

Returns:

The variance tensor.

Return type:

out

plenoptic.tools.straightness module
plenoptic.tools.straightness.deviation_from_line(sequence, normalize=True)[source]

Compute the deviation of sequence to the straight line between its endpoints.

Project each point of the path sequence onto the line defined by the anchor points, and measure the two sides of a right triangle: - from the projected point to the first anchor point

(aka. distance along line)

  • from the projected point to the corresponding point on the path sequence (aka. distance from line).

Parameters:
  • sequence (Tensor) – sequence of signals of shape (T, channel, height, width)

  • normalize (bool) – use the distance between the anchor points as a unit of measurement

Return type:

Tuple[Tensor, Tensor]

Returns:

  • dist_along_line – sequence of T euclidian distances along the line

  • dist_from_line – sequence of T euclidian distances to the line

plenoptic.tools.straightness.make_straight_line(start, stop, n_steps)[source]

make a straight line between start and stop with n_steps transitions.

Parameters:
  • start (Tensor) – Images of shape (1, channel, height, width), the anchor points between which a line will be made.

  • stop (Tensor) – Images of shape (1, channel, height, width), the anchor points between which a line will be made.

  • n_steps (int) – Number of steps (i.e., transitions) to create between the two anchor points. Must be positive.

Returns:

Tensor of shape (n_steps+1, channel, height, width)

Return type:

straight

plenoptic.tools.straightness.sample_brownian_bridge(start, stop, n_steps, max_norm=1)[source]

Sample a brownian bridge between start and stop made up of n_steps

Parameters:
  • start (Tensor) – signal of shape (1, channel, height, width), the anchor points between which a random path will be sampled (like pylons on which the bridge will rest)

  • stop (Tensor) – signal of shape (1, channel, height, width), the anchor points between which a random path will be sampled (like pylons on which the bridge will rest)

  • n_steps (int) – number of steps on the bridge

  • max_norm (float) – controls variability of the bridge by setting how far (in l2 norm) it veers from the straight line interpolation at the midpoint between pylons. each component of the bridge will reach a maximal variability with std = max_norm / sqrt(d), where d is the dimension of the signal. (ie. d = C*H*W). Must be non-negative.

Returns:

sequence of shape (n_steps+1, channel, height, width) a brownian bridge across the two pylons

Return type:

bridge

plenoptic.tools.straightness.translation_sequence(image, n_steps=10)[source]

make a horizontal translation sequence on image

Parameters:
  • image (Tensor) – Base image of shape, (1, channel, height, width)

  • n_steps (int) – Number of steps in the sequence. The length of the sequence is n_steps + 1. Must be positive.

Returns:

translation sequence of shape (n_steps+1, channel, height, width)

Return type:

sequence

plenoptic.tools.validate module

Functions to validate synthesis inputs.

plenoptic.tools.validate.remove_grad(model)[source]

Detach all parameters and buffers of model (in place).

plenoptic.tools.validate.validate_coarse_to_fine(model, image_shape=None, device='cpu')[source]

Determine whether a model can be used for coarse-to-fine synthesis.

In particular, this function checks the following (with associated errors):

  • Whether model has a scales attribute (AttributeError).

  • Whether model.forward accepts a scales keyword argument (TypeError).

  • Whether the output of model.forward changes shape when the scales keyword argument is set (ValueError).

Parameters:
  • model (Module) – The model to validate.

  • image_shape (Optional[Tuple[int, int, int, int]]) – Some models (e.g., the steerable pyramid) can only accept inputs of a certain shape. If that’s the case for model, use this to specify the expected shape. If None, we use an image of shape (1,1,16,16)

  • device (Union[str, device]) – Which device to place the test image on.

plenoptic.tools.validate.validate_input(input_tensor, no_batch=False, allowed_range=None)[source]

Determine whether input_tensor tensor can be used for synthesis.

In particular, this function:

  • Checks if input_tensor has a float or complex dtype

  • Checks if input_tensor is 4d.

  • If no_batch is True, check whether input_tensor.shape[0] != 1

  • If allowed_range is not None, check whether all values of input_tensor lie within the specified range.

If any of the above fail, a ValueError is raised.

Parameters:
  • input_tensor (Tensor) – The tensor to validate.

  • no_batch (bool) – If True, raise a ValueError if the batch dimension of input_tensor is greater than 1.

  • allowed_range (Optional[Tuple[float, float]]) – If not None, ensure that all values of input_tensor lie within allowed_range.

plenoptic.tools.validate.validate_metric(metric, image_shape=None, image_dtype=torch.float32, device='cpu')[source]

Determines whether a metric can be used for MADCompetition synthesis.

In particular, this functions checks the following (with associated exceptions):

  • Whether metric is callable and accepts two 4d tensors as input (TypeError).

  • Whether metric returns a scalar when called with two 4d tensors as input (ValueError).

  • Whether metric returns a value less than 5e-7 when with two identical 4d tensors as input (ValueError). (This threshold was chosen because 1-SSIM of two identical images is 5e-8 on GPU).

Parameters:
  • metric (Union[Module, Callable[[Tensor, Tensor], Tensor]]) – The metric to validate.

  • image_shape (Optional[Tuple[int, int, int, int]]) – Some models (e.g., the steerable pyramid) can only accept inputs of a certain shape. If that’s the case for model, use this to specify the expected shape. If None, we use an image of shape (1,1,16,16)

  • image_dtype (dtype) – What dtype to validate against.

  • device (Union[str, device]) – What device to place the test images on.

plenoptic.tools.validate.validate_model(model, image_shape=None, image_dtype=torch.float32, device='cpu')[source]

Determine whether model can be used for sythesis.

In particular, this function checks the following (with their associated errors raised):

  • If model adds a gradient to an input tensor, which implies that some of it is learnable (ValueError).

  • If model returns a tensor when given a tensor, failure implies that not all computations are done using torch (ValueError).

  • If model strips gradient from an input with gradient attached (ValueError).

  • If model casts an input tensor to something else and returns it to a tensor before returning it (ValueError).

  • If model changes the precision of the input tensor (TypeError).

  • If model returns a 3d or 4d output when given a 4d input (ValueError).

  • If model changes the device of the input (RuntimeError).

Finally, we check if model is in training mode and raise a warning if so. Note that this is different from having learnable parameters, see ``pytorch docs <https://pytorch.org/docs/stable/notes/autograd.html#locally-disable-grad-doc>``_

Parameters:
  • model (Module) – The model to validate.

  • image_shape (Optional[Tuple[int, int, int, int]]) – Some models (e.g., the steerable pyramid) can only accept inputs of a certain shape. If that’s the case for model, use this to specify the expected shape. If None, we use an image of shape (1,1,16,16)

  • image_dtype (dtype) – What dtype to validate against.

  • device (Union[str, device]) – What device to place test image on.

See also

remove_grad

Helper function for detaching all parameters (in place).

Module contents
Submodules
plenoptic.version module
Module contents

Display and animate functions

plenoptic contains a variety of code for visualizing the outputs and the process of synthesis. This notebook details how to make use of that code, which has largely been written with the following goals: 1. If you follow the model API (and that of Synthesis, if creating a new synthesis method), display code should plot something reasonably useful automatically. 2. The code is flexible enough to allow for customization for more useful visualizations. 3. If the plotting code works, the animate code should also.

[1]:
import plenoptic as po
import matplotlib.pyplot as plt
# so that relativfe sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
import torch
import numpy as np

%load_ext autoreload
%autoreload 2
%matplotlib inline
/mnt/home/wbroderick/miniconda3/envs/plenoptic/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[2]:
plt.rcParams['figure.dpi'] = 72

General

We include two wrappers of display code from pyrtools, adapting them for use with tensors. These imshow and animshow, which accept tensors of real- or complex-valued images or videos (respectively) and properly convert them to arrays for display purposes. These are not the most flexible functions (for example, imshow requires that real-valued tensors be 4d) but, assuming you follow our API, should work relatively painlessly. The main reason for using them (over the image-display code from matplotlib) is that we guarantee fidelity to image size: a value in the tensor corresponds to a pixel or an integer number of pixels in the image (if upsampling); if downsampling, we can only down-sample by factors of two. This way, you can be sure that any strange apperance of the image is not due to aliasing in the plotting.

For imshow, we require that real-valued tensors be 4d: (batch, channel, height, width). If you’re showing images, they’re likely to be grayscale (in which case there’s only 1 channel) or RGB(A) (in which case there’s 3 or 4, depending on whether it includes the alpha channel). We plot grayscale images without a problem:

[3]:
img = torch.cat([po.data.einstein(), po.data.curie()], axis=0)
print(img.shape)
fig = po.imshow(img)
torch.Size([2, 1, 256, 256])
_images/tutorials_advanced_Display_4_1.png

We need to tell imshow that the image(s) are RGB in order for it to be plot correctly.

[4]:
rgb = torch.rand(2, 3, 256, 256)
print(rgb.shape)
fig = po.imshow(rgb, as_rgb=True)
torch.Size([2, 3, 256, 256])
_images/tutorials_advanced_Display_6_1.png

This is because we don’t want to assume that a tensor with 3 or 4 channels is always RGB. To pick a somewhat-contrived example, imagine the following steerable pyramid:

[5]:
pyr = po.simul.SteerablePyramidFreq(img.shape[-2:], downsample=False, height=1, order=2)
[6]:
coeffs, _ = pyr.convert_pyr_to_tensor(pyr(img),split_complex=False)

print(coeffs.shape)
torch.Size([2, 5, 256, 256])

The first and last channels are residuals, so if we only wanted to look at the coefficients, we’d do the following:

[7]:
po.imshow(coeffs[:, 1:-1], batch_idx=0)
po.imshow(coeffs[:, 1:-1], batch_idx=1);
_images/tutorials_advanced_Display_11_0.png
_images/tutorials_advanced_Display_11_1.png

We really don’t want to interpret those values as RGB.

Note that in the above imshow calls, we had to specify the batch_idx. This function expects a 4d tensor, but if it has more than one channel and more than one batch (and it’s not RGB), we can’t display everything. The user must therefore specify either batch_idx or channel_idx.

[8]:
po.imshow(coeffs[:, 1:-1], channel_idx=0);
_images/tutorials_advanced_Display_13_0.png

animshow works analogously to imshow, wrapping around the pyrtools version but expecting a 5d tensor: (batch, channel, time, height, width). It returns a matplotlib.animation.FuncAnimation object, which can be saved as an mp4 or converted to an html object for display in a Jupyter notebook

[9]:
pyr = po.simul.SteerablePyramidFreq(img.shape[-2:], downsample=False, height='auto', order=3, is_complex=True, tight_frame=False)
coeffs, _ = pyr.convert_pyr_to_tensor(pyr(img), split_complex=False)
print(coeffs.shape)
# because coeffs is 4d, we add a dummy dimension for the channel in order to make animshow happy
po.tools.convert_anim_to_html(po.animshow(coeffs.unsqueeze(1), batch_idx=0,vrange='indep1'))
torch.Size([2, 26, 256, 256])
[9]:

Synthesis-specific

Each synthesis method has a variety of display code to visualize the state and progress of synthesis, as well as to ease understanding of the process and look for ways to improve. For example, in metamer synthesis, it can be useful to determine what component of the model has the largest error.

[10]:
img = po.data.einstein()
model = po.simul.OnOff((7, 7))
rep = model(img)
/mnt/home/wbroderick/miniconda3/envs/plenoptic/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

As long as your model returns a 3d or 4d vector (first two dimensions corresponding to batch and channel), then our plotting code should work automatically. If it returns a 3d representation, we plot a stem plot; if it’s 4d, an image.

[11]:
po.tools.display.plot_representation(data=rep, figsize=(11, 5));
_images/tutorials_advanced_Display_19_0.png

This also gets used in the plotting code built into our synthesis methods.

[12]:
po.tools.remove_grad(model)
met = po.synth.Metamer(img, model)
met.synthesize(max_iter=100, store_progress=True,);
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
  warnings.warn(
 58%|█████▊    | 58/100 [00:01<00:01, 40.93it/s, loss=6.0124e-06, learning_rate=0.01, gradient_norm=7.5704e-04, pixel_change_norm=2.7111e-01]/mnt/home/wbroderick/plenoptic/src/plenoptic/synthesize/metamer.py:195: UserWarning: Loss has converged, stopping synthesis
  warnings.warn("Loss has converged, stopping synthesis")
 61%|██████    | 61/100 [00:01<00:00, 39.43it/s, loss=6.0124e-06, learning_rate=0.01, gradient_norm=7.5704e-04, pixel_change_norm=2.7111e-01]

After we’ve run synthesis for a while, we want to investigate how close we are. We can examine the numbers printed out above, but it’s probably useful to plot something. We provide the plot_synthesis_status() function for doing this. By default, it includes the synthesized image, the loss, and the representation error. That lost plot is the same as the one above, except it plots data = base_representation - synthesized_representation.

[13]:
# we have two image plots for representation error, so that bit should be 2x wider
fig = po.synth.metamer.plot_synthesis_status(met, width_ratios={'plot_representation_error': 2.1})
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/display.py:950: UserWarning: ax is not None, so we're ignoring figsize...
  warnings.warn("ax is not None, so we're ignoring figsize...")
_images/tutorials_advanced_Display_23_1.png

You can also create this plot at different iterations, in order to try and better understand what’s happening

[14]:
fig = po.synth.metamer.plot_synthesis_status(met, iteration=10, width_ratios={'plot_representation_error': 2.1})
_images/tutorials_advanced_Display_25_0.png

The appearance of this figure is very customizable. There are several additional plots that can be included, and all plots are optional. The additional plot below is two histograms comparing the pixel values of the synthesized and base signal.

[15]:
fig = po.synth.metamer.plot_synthesis_status(met, included_plots=['display_metamer', 'plot_loss',
                                                                  'plot_representation_error', 'plot_pixel_values'],
                                             width_ratios={'plot_representation_error': 2.1})
_images/tutorials_advanced_Display_27_0.png

In addition to being able to customize which plots to include, you can also pre-create the figure (with axes, if you’d like) and pass it in. By default, we try and create an appropriate-looking figure, with appropriately-sized plots, but this allows for more flexibility:

[16]:
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
fig = po.synth.metamer.plot_synthesis_status(met, included_plots=['display_metamer', 'plot_loss',
                                                                  'plot_pixel_values'],
                                             fig=fig)
_images/tutorials_advanced_Display_29_0.png

For even more flexibility, you can specify which plot should go in which axes, by creating an axes_idx dictionary. Keys for each plot can be created, as well as a subset (in which case each plot gets added to the next available axes, like above when axes_idx is unset; see docstring for key names):

[17]:
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
axes_idx = {'display_metamer': 3, 'plot_pixel_values': 0}
fig = po.synth.metamer.plot_synthesis_status(met, included_plots=['display_metamer', 'plot_loss',
                                                                  'plot_pixel_values'],
                                             fig=fig, axes_idx=axes_idx)
_images/tutorials_advanced_Display_31_0.png

This allows enables you to create more complicated figures, with axes containing other plots, arrows and other annotations, etc.

[18]:
fig, axes = plt.subplots(2, 3, figsize=(17, 12))
# to tell plot_synthesis_status to ignore plots, add them to the misc keys
axes_idx = {'display_metamer': 5, 'misc': [0, 4]}
axes[0, 0].text(.5, .5, 'SUPER COOL TEXT', color='r')
axes[1, 0].arrow(0, 0, .25, .25, )
axes[0, 0].plot(np.linspace(0, 1), np.random.rand(50))
fig = po.synth.metamer.plot_synthesis_status(met, included_plots=['display_metamer', 'plot_loss',
                                                                  'plot_pixel_values'],
                                             fig=fig, axes_idx=axes_idx)
_images/tutorials_advanced_Display_33_0.png

We similarly have an animate function, which animates the above plots over time, and everything that I said above also holds for them. Note that animate will take a fair amount of time to run and requires ffmpeg on your system for most file formats (see matplotlib docs for more details).

[19]:
fig, axes = plt.subplots(2, 3, figsize=(17, 12))
# to tell plot_synthesis_status to ignore plots, add them to the misc keys
axes_idx = {'display_metamer': 5, 'misc': [0, 4]}
axes[0, 0].text(.5, .5, 'SUPER COOL TEXT', color='r')
axes[1, 0].arrow(0, 0, .25, .25, )
axes[0, 0].plot(np.linspace(0, 1), np.random.rand(50))
anim = po.synth.metamer.animate(met, included_plots=['display_metamer', 'plot_loss',
                                                                  'plot_pixel_values'],
                                fig=fig, axes_idx=axes_idx,)

This anim object is not viewable by itself: it either needs to be converted to html for display in the notebook, or saved as an .mp4 file (by calling anim.save(filename))

[20]:
po.tools.convert_anim_to_html(anim)
[20]:

More complicated model representation plots

While this provides a starting point, it’s not always super useful. In the example above, the LinearNonlinear model returns the output of several convolutional kernels across the image, and so plotting as a series of images is pretty decent. The representation of the PortillaSimoncelli model below, however, has several distinct components at multiple spatial scales and orientations. That structure is lost in a single stem plot:

[21]:
img = po.data.reptile_skin()
ps = po.simul.PortillaSimoncelli(img.shape[-2:])
rep = ps(img)
po.tools.display.plot_representation(data=rep);
_images/tutorials_advanced_Display_40_0.png

Trying to guess this advanced structure would be impossible for our generic plotting functions. However, if your model has a plot_representation() method, we can make use of it:

[22]:
ps.plot_representation(data=rep, ylim=False);
_images/tutorials_advanced_Display_42_0.png

Our display.plot_representation function can make use of this method if you pass it the model; note how the plot below is identical to the one above. This might not seem very useful, but we make use of this in the different plotting methods used by our synthesis classes explained above.

[23]:
po.tools.display.plot_representation(ps, rep, figsize=(15, 15));
_images/tutorials_advanced_Display_44_0.png
[24]:
met = po.synth.MetamerCTF(img, ps, loss_function=po.tools.optim.l2_norm, coarse_to_fine='together')
met.synthesize(max_iter=400, store_progress=10,
               change_scale_criterion=None, ctf_iters_to_check=10);
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
  warnings.warn(
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:211: UserWarning: Validating whether model can work with coarse-to-fine synthesis -- this can take a while!
  warnings.warn("Validating whether model can work with coarse-to-fine synthesis -- this can take a while!")
100%|██████████| 400/400 [00:38<00:00, 10.35it/s, loss=2.5739e-01, learning_rate=0.01, gradient_norm=1.2590e+00, pixel_change_norm=2.4271e-01, current_scale=all, current_scale_loss=2.5739e-01]
[25]:
fig, _ = po.synth.metamer.plot_synthesis_status(met)
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/display.py:950: UserWarning: ax is not None, so we're ignoring figsize...
  warnings.warn("ax is not None, so we're ignoring figsize...")
_images/tutorials_advanced_Display_46_1.png

And again, we can animate this over time:

[26]:
po.tools.convert_anim_to_html(po.synth.metamer.animate(met))
[26]:

Advanced

Put info about update_plot here?

Extending existing synthesis objects

Once you are familiar with the existing synthesis objects included in plenoptic, you may wish to change some aspect of their function. For example, you may wish to change how the po.synth.MADCompetition initializes the MAD image or alter the objective function of po.synth.Metamer. While you could certainly start from scratch or copy the source code of the object and alter them directly, an easier way to do so is to create a new sub-class: an object that inherits the synthesis object you wish to modify and over-writes some of its existing methods.

For example, you could create a version of po.synth.MADCompetition that starts with a different natural image (rather than with image argument plus normally-distributed noise) by creating the following object:

[1]:
import plenoptic as po
from torch import Tensor
import torch
import matplotlib.pyplot as plt
import warnings
from typing import Union, Callable, Tuple, Optional
from typing_extensions import Literal

# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72

%load_ext autoreload
%autoreload 2
[2]:
class MADCompetitionVariant(po.synth.MADCompetition):
    """Initialize MADCompetition with an image instead!"""
    def __init__(self, image: Tensor,
                optimized_metric: Union[torch.nn.Module, Callable[[Tensor, Tensor], Tensor]],
                reference_metric: Union[torch.nn.Module, Callable[[Tensor, Tensor], Tensor]],
                minmax: Literal['min', 'max'],
                initial_image: Tensor = None,
                metric_tradeoff_lambda: Optional[float] = None,
                range_penalty_lambda: float = .1,
                allowed_range: Tuple[float, float] = (0, 1)):
        if initial_image is None:
            initial_image = torch.rand_like(image)
        super().__init__(image, optimized_metric, reference_metric,
                         minmax, initial_image, metric_tradeoff_lambda,
                         range_penalty_lambda, allowed_range)

    def _initialize(self, initial_image: Tensor):
        mad_image = initial_image.clamp(*self.allowed_range)
        self._initial_image = mad_image.clone()
        mad_image.requires_grad_()
        self._mad_image = mad_image
        self._reference_metric_target = self.reference_metric(self.image,
                                                              self.mad_image).item()
        self._reference_metric_loss.append(self._reference_metric_target)
        self._optimized_metric_loss.append(self.optimized_metric(self.image,
                                                                 self.mad_image).item())

We can then interact with this new object in the same way as the original MADCompetition object, the only difference being how it’s initialized:

[3]:
image = po.data.einstein()
curie = po.data.curie()

new_mad = MADCompetitionVariant(image, po.metric.mse, lambda *args: 1-po.metric.ssim(*args),
                                'min', curie)
old_mad = po.synth.MADCompetition(image, po.metric.mse, lambda *args: 1-po.metric.ssim(*args),
                                  'min', .1)
/home/billbrod/Documents/plenoptic/plenoptic/tools/data.py:126: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
  images = torch.tensor(images, dtype=torch.float32)
/home/billbrod/miniconda3/envs/plenoptic/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/billbrod/Documents/plenoptic/plenoptic/synthesize/mad_competition.py:130: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  loss_ratio = torch.tensor(self.optimized_metric_loss[-1] / self.reference_metric_loss[-1],
/home/billbrod/Documents/plenoptic/plenoptic/synthesize/mad_competition.py:134: UserWarning: Since metric_tradeoff_lamda was None, automatically set to 0.10000000149011612 to roughly balance metrics.
  warnings.warn("Since metric_tradeoff_lamda was None, automatically set"
/home/billbrod/Documents/plenoptic/plenoptic/synthesize/mad_competition.py:134: UserWarning: Since metric_tradeoff_lamda was None, automatically set to 0.009999999776482582 to roughly balance metrics.
  warnings.warn("Since metric_tradeoff_lamda was None, automatically set"

We can see below that the two versions have the same image whose representation they’re trying to match, but very different initial images.

[4]:
po.imshow([old_mad.image, old_mad.initial_image, new_mad.image, new_mad.initial_image],
          col_wrap=2);
_images/tutorials_advanced_Synthesis_extensions_6_0.png

We call synthesize in the same way and can even make use of the original plot_synthesis_status function to see what synthesis looks like

[5]:
with warnings.catch_warnings():
    # we suppress the warning telling us that our image falls outside of the (0, 1) range,
    # which will happen briefly during synthesis.
    warnings.simplefilter('ignore')
    old_mad.synthesize(store_progress=True)
po.synth.mad_competition.plot_synthesis_status(old_mad, included_plots=['display_mad_image', 'plot_loss']);
_images/tutorials_advanced_Synthesis_extensions_8_1.png
[6]:
with warnings.catch_warnings():
    # we suppress the warning telling us that our image falls outside of the (0, 1) range,
    # which will happen briefly during synthesis.
    warnings.simplefilter('ignore')
    new_mad.synthesize(store_progress=True)
po.synth.mad_competition.plot_synthesis_status(new_mad, included_plots=['display_mad_image', 'plot_loss']);
_images/tutorials_advanced_Synthesis_extensions_9_1.png

For version initialized with the image of Marie Curie, let’s also examine the metamer shortly after synthesis started, since the final version doesn’t look that different:

[7]:
po.synth.mad_competition.display_mad_image(new_mad, iteration=10);
_images/tutorials_advanced_Synthesis_extensions_11_0.png

See the documentation for more description of how the synthesis objects are structured to get ideas for how else to modify them, but some good methods to over-write include (note that not every object uses each of these methods): _initialize, _check_convergence, and objective_function (for more serious changes to initialization, probably better to start with _initialize). For a more serious change, you could also overwrite synthesis and _optimizer_step (and possibly _closure) to really change how synthesis works. See po.synth.MetamerCTF for an example of how to do this.

[Portilla2000] (1,2)

Portilla, J., & Simoncelli, E. P. (2000). A parametric texture model based on joint statistics of complex wavelet coefficients. International journal of computer vision, 40(1), 49–70. https://www.cns.nyu.edu/~lcv/texture/. https://www.cns.nyu.edu/pub/eero/portilla99-reprint.pdf

[Freeman2011]

Freeman, J., & Simoncelli, E. P. (2011). Metamers of the ventral stream. Nature Neuroscience, 14(9), 1195–1201. http://www.cns.nyu.edu/pub/eero/freeman10-reprint.pdf

[Deza2019]

Deza, A., Jonnalagadda, A., & Eckstein, M. P. (2019). Towards metamerism via foveated style transfer. In , International Conference on Learning Representations.

[Feather2019]

Feather, J., Durango, A., Gonzalez, R., & McDermott, J. (2019). Metamers of neural networks reveal divergence from human perceptual systems. In NeurIPS (pp. 10078–10089).

[Wallis2019]

Wallis, T. S., Funke, C. M., Ecker, A. S., Gatys, L. A., Wichmann, F. A., & Bethge, M. (2019). Image content is more important than bouma’s law for scene metamers. eLife. http://dx.doi.org/10.7554/elife.42512

[Berardino2017]

Berardino, A., Laparra, V., J Ball'e, & Simoncelli, E. P. (2017). Eigen-distortions of hierarchical representations. In I. Guyon, U. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett, Adv. Neural Information Processing Systems (NIPS*17) (pp. 1–10). : Curran Associates, Inc. https://www.cns.nyu.edu/~lcv/eigendistortions/ http://www.cns.nyu.edu/pub/lcv/berardino17c-final.pdf

[Wang2008]

Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual discriminability. Journal of Vision, 8(12), 1–13. https://ece.uwaterloo.ca/~z70wang/research/mad/ http://www.cns.nyu.edu/pub/lcv/wang08-preprint.pdf

[Henaff2016]

H'enaff, O.~J., & Simoncelli, E.~P. (2016). Geodesics of learned representations. ICLR. http://www.cns.nyu.edu/pub/lcv/henaff16b-reprint.pdf

[Henaff2020]

O Hénaff, Y Bai, J Charlton, I Nauhaus, E P Simoncelli and R L T Goris. Primary visual cortex straightens natural video trajectories Nature Communications, vol.12(5982), Oct 2021. https://www.cns.nyu.edu/pub/lcv/henaff20-reprint.pdf

[Simoncelli1992]

Simoncelli, E. P., Freeman, W. T., Adelson, E. H., & Heeger, D. J. (1992). Shiftable Multi-Scale Transforms. IEEE Trans. Information Theory, 38(2), 587–607. http://dx.doi.org/10.1109/18.119725

[Simoncelli1995]

Simoncelli, E. P., & Freeman, W. T. (1995). The steerable pyramid: A flexible architecture for multi-scale derivative computation. In , Proc 2nd IEEE Int’l Conf on Image Proc (ICIP) (pp. 444–447). Washington, DC: IEEE Sig Proc Society. http://www.cns.nyu.edu/pub/eero/simoncelli95b.pdf

[Wang2004]

Wang, Z., Bovik, A., Sheikh, H., & Simoncelli, E. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://www.cns.nyu.edu/~lcv/ssim/. http://www.cns.nyu.edu/pub/lcv/wang03-reprint.pdf

[Wang2003]

Z Wang, E P Simoncelli and A C Bovik. Multiscale structural similarity for image quality assessment Proc 37th Asilomar Conf on Signals, Systems and Computers, vol.2 pp. 1398–1402, Nov 2003. http://www.cns.nyu.edu/pub/eero/wang03b.pdf

[Laparra2017]

Laparra, V., Berardino, A., Johannes Ball'e, & Simoncelli, E. P. (2017). Perceptually Optimized Image Rendering. Journal of the Optical Society of America A, 34(9), 1511. http://www.cns.nyu.edu/pub/lcv/laparra17a.pdf

[Laparra2016]

Laparra, V., Ballé, J., Berardino, A. and Simoncelli, E.P., 2016. Perceptual image quality assessment using a normalized Laplacian pyramid. Electronic Imaging, 2016(16), pp.1-6. http://www.cns.nyu.edu/pub/lcv/laparra16a-reprint.pdf

[Ziemba2021]

Ziemba, C.M., and Simoncelli, E.P. (2021). Opposing effects of selectivity and invariance in peripheral vision. Nature Communications, vol.12(4597). https://dx.doi.org/10.1038/s41467-021-24880-5

This package is supported by the ‘Center for Computational Neuroscience <https://www.simonsfoundation.org/flatiron/center-for-computational-neuroscience/>’_, in the Flatiron Institute of the Simons Foundation.

Flatiron Institute Center for Computational Neuroscience logo