plenoptic
plenoptic
is a python library for model-based synthesis of perceptual stimuli. For plenoptic
, models are those of visual [1] information processing: they accept an image as input, perform some computations, and return some output, which can be mapped to neuronal firing rate, fMRI BOLD response, behavior on some task, image category, etc. The intended audience is researchers in neuroscience, psychology, and machine learning. The generated stimuli enable interpretation of model properties through examination of features that are enhanced, suppressed, or discarded. More importantly, they can facilitate the scientific process, through use in further perceptual or neural experiments aimed at validating or falsifying model predictions.
Getting started
If you are unfamiliar with stimulus synthesis, see the Conceptual Introduction for an in-depth introduction.
Otherwise, see the Quickstart tutorial.
Installation
The best way to install plenoptic
is via pip
:
$ pip install plenoptic
See the Installation page for more details, including how to set up an isolated virtual environment (recommended).
ffmpeg and videos
Some methods in this package generate videos. There are several backends
available for saving the animations to file (see matplotlib documentation
).
To convert them to HTML5 for viewing (for example, in a
jupyter notebook), you’ll need ffmpeg
installed. Depending on your system, this might already
be installed, but if not, the easiest way is probably through conda: conda install -c conda-forge
ffmpeg
.
To change the backend, run matplotlib.rcParams['animation.writer'] = writer
before calling any of the animate functions. If you try to set that rcParam
with a random string, matplotlib
will list the available choices.
Contents
Synthesis methods
Metamers: given a model and a reference image, stochastically generate a new image whose model representation is identical to that of the reference image (a “metamer”, as originally defined in the literature on Trichromacy). This method makes explicit those features that the model retains/discards.
Example papers: [Portilla2000], [Freeman2011], [Deza2019], [Feather2019], [Wallis2019], [Ziemba2021]
Eigendistortions: given a model and a reference image, compute the image perturbations that produce the smallest/largest change in the model response space. These are the image changes to which the model is least/most sensitive, respectively.
Example papers: [Berardino2017]
Maximal differentiation (MAD) competition: given a reference image and two models that measure distance between images, generate pairs of images that optimally differentiate the models. Specifically, synthesize a pair of images that are equi-distant from the reference image according to model-1, but maximally/minimally distant according to model-2. Synthesize a second pair with the roles of the two models reversed. This method allows for efficient comparison of two metrics, highlighting the aspects in which their sensitivities most differ.
Example papers: [Wang2008]
Geodesics: given a model and two images, synthesize a sequence of images that lie on the shortest (“geodesic”) path in the model’s representation space. This method allows examination of the larger-scale geometric properties of model representation (as opposed to the local properties captured by the eigendistortions).
Example papers: [Henaff2016], [Henaff2020]
Models, Metrics, and Model Components
Steerable pyramid, [Simoncelli1992] and [Simoncelli1995], a multi-scale oriented image decomposition. Images are decomposed with a family of oriented filters, localized in space and frequency, similar to the “Gabor functions” commonly used to model receptive fields in primary visual cortex. The critical difference is that the pyramid organizes these filters so as to effeciently cover the 4D space of (x,y) positions, orientations, and scales, enabling efficient interpolation and interpretation (further info ). See the pyrtools documentation for more details on python tools for image pyramids in general and the steerable pyramid in particular.
Portilla-Simoncelli texture model, [Portilla2000], which computes a set of image statistics that capture the appearance of visual textures (further info).
Structural Similarity Index (SSIM), [Wang2004], is a perceptual similarity metric, that takes two images and returns a value between -1 (totally different) and 1 (identical) reflecting their similarity (further info).
Multiscale Structural Similarity Index (MS-SSIM), [Wang2003], is an extension of SSIM that operates jointly over multiple scales.
Normalized Laplacian distance, [Laparra2016] and [Laparra2017], is a perceptual distance metric based on transformations associated with the early visual system: local luminance subtraction and local contrast gain control, at six scales (further info).
Getting help
We communicate via several channels on Github:
To report a bug, open an issue.
To send suggestions for extensions or enhancements, please post in the ideas section of discussions first. We’ll discuss it there and, if we decide to pursue it, open an issue to track progress.
To ask usage questions, discuss broad issues, or show off what you’ve made with plenoptic, go to Discussions.
To contribute to the project, see the contributing guide.
In all cases, we request that you respect our code of conduct.
Citing us
If you use plenoptic
in a published academic article or presentation, please
cite us! See the Citation Guide for more details.
Installation
plenoptic
should work on Windows, Linux, or Mac. If you have a problem with installation, please open a bug report!
The easiest way to install plenoptic
is from PyPI (the Python Package Index) using pip within a new virtual environment. The instructions on this page use conda, which we recommend if you are unfamiliar with python environment management, but other virtual environment systems should work. If you wish to follow these instructions and do not have conda
installed on your machine, I recommend starting with miniconda.:
$ conda create --name plenoptic pip python=3.9
$ conda activate plenoptic
$ pip install plenoptic
Our dependencies include pytorch and pyrtools. Installation should take care of them (along with our other dependencies) automatically, but if you have an installation problem (especially on a non-Linux operating system), it is likely that the problem lies with one of those packages. Open an issue and we’ll try to help you figure out the problem!
You can also install it directly from source to have a local editable copy. This is most useful for developing (for more info, see our contributing guide) or if you want to use the most cutting-edge version:
$ conda create --name plenoptic pip python=3.9
$ conda activate plenoptic
$ # clone the repository
$ git clone https://github.com/LabForComputationalVision/plenoptic.git
$ cd plenoptic
$ # install in editable mode with `-e` or, equivalently, `--editable`
$ pip install -e .
With an editable copy, any changes locally will be automatically reflected in your installation (under the hood, this command uses symlinks).
Attention
To install plenoptic
in editable mode, you need pip >= 21.3
(see pip’s changelog). If you run into an error after running the pip install -e .
command, try updating your pip version with pip install --upgrade pip
.
Optional dependencies
The above instructions will install plenoptic and its core dependencies. You may also wish to install some additional optional dependencies. These dependencies are specified using square brackets during the pip install command and can be installed for either a local, editable install or one directly from PyPI:
If you would like to run the jupyter notebooks locally:
pip install plenoptic[nb]
orpip install -e .[nb]
. This includespooch
(for downloading some extra data)torchvision
(which has some models we’d like to use),jupyter
, and related libraries. See the jupyter section for more details on how to handle jupyter and python virtual environments. Note that you can run our notebooks in the cloud using Binder, no installation required!If you would like to locally build the documentation:
pip install -e .[docs]
. This includessphinx
and related libraries. (This probably only makes sense if you have a local installation.)If you would like to run the tests:
pip install -e .[dev]
. This includespytest
and related libraries. (This probably only makes sense if you have a local installation.)
These optional dependencies can be joined with a comma: pip install -e .[docs,dev]
ffmpeg and videos
Several methods in this package generate videos. There are several backends
possible for saving the animations to file, see matplotlib documentation for more
details. In order to convert them to HTML5 for viewing (and thus, to view in a
jupyter notebook), you’ll need ffmpeg
installed and on your path as well. Depending on your system, this might already
be installed, but if not, the easiest way is probably through conda: conda install -c conda-forge
ffmpeg
.
To change the backend, run matplotlib.rcParams['animation.writer'] = writer
before calling any of the animate functions. If you try to set that rcParam
with a random string, matplotlib
will tell you the available choices.
Running notebooks locally
Tip
You can run the notebooks in the cloud using Binder, no installation required!
Installing jupyter and setting up the kernel
If you wish to locally run the notebooks, you will need to install jupyter
,
ipywidgets
, and (for some of the notebooks) torchvision
and pooch
.
There are three possible ways of getting a local jupyter install working with this
package, depending on how you wish to handle virtual environments.
Hint
If plenoptic
is the only environment that you want to run notebooks from and/or you are unfamiliar with virtual environments, go with option 1 below.
Install jupyter in the same environment as
plenoptic
. This is the easiest but, if you have multiple virtual environments and want to use Jupyter notebooks in each of them, it will take up a lot of space. If you followed the instructions above to create aconda
environment namedplenoptic
, do the following:$ conda activate plenoptic $ conda install -c conda-forge jupyterlab ipywidgets torchvision pooch
With this setup, when you have another virtual environment that you wish to run jupyter notebooks from, you must reinstall jupyuter into that separate virtual environment, which is wasteful.
Install jupyter in your
base
environment and use nb_conda_kernels to automatically manage kernels in all your conda environments. This is a bit more complicated, but means you only have one installation of jupyter lab on your machine. Again, if you followed the instructions to create aconda
environment namedplenoptic
:$ # activate your 'base' environment, the default one created by conda/miniconda $ conda activate base $ # install jupyter lab and nb_conda_kernels in your base environment $ conda install -c conda-forge jupyterlab ipywidgets $ conda install nb_conda_kernels $ # install ipykernel, torchvision, and pooch in the plenoptic environment $ conda install -n plenoptic ipykernel torchvision pooch
With this setup, you have a single jupyter install that can run kernels from any of your conda environments. All you have to do is install
ipykernel
(and restart jupyter) and you should see the new kernel!Attention
This method only works with conda environments. If you are using another method to manage your python virtual environments, you’ll have to use one of the other methods.
Install jupyter in your
base
environment and manually install the kernel in your virtual environment. This requires only a single jupyter install and is the most general solution (it will work with conda or any other way of managing virtual environments), but requires you to be a bit more comfortable with handling environments. Again, if you followed the instructions to create aconda
environment namedplenoptic
:$ # activate your 'base' environment, the default one created by conda/miniconda $ conda activate base $ # install jupyter lab and nb_conda_kernels in your base environment $ conda install -c conda-forge jupyterlab ipywidgets $ # install ipykernel and torchvision in the plenoptic environment $ conda install -n plenoptic ipykernel torchvision pooch $ conda activate plenoptic $ python -m ipykernel install --prefix=/path/to/jupyter/env --name 'plenoptic'
/path/to/jupyter/env
is the path to your base conda environment, and depends on the options set during your initial installation. It’s probably something like~/conda
or~/miniconda
. See the ipython docs for more details.With this setup, similar to option 2, you have a single jupyter install that can run kernels from any virtual environment. The main difference is that it can run kernels from any virtual environment (not just conda!) and have fewer packages installed in your
base
environment, but that you have to run an additional line after installingipykernel
into the environment (python -m ipykernel install ...
).Note
If you’re not using conda to manage your environments, the key idea is to install
jupyter
andipywidgets
in one environment, then installipykernel
,torchvision
, andpooch
in the same environment as plenoptic, and then run theipykernel install
command using the plenoptic environment’s python.
The following table summarizes the advantages and disadvantages of these three choices:
Method |
Advantages |
Disadvantages |
---|---|---|
|
✅ Simple |
❌ Requires lots of hard drive space |
|
✅ Set up once |
❌ Initial setup more complicated |
✅ Requires only one jupyter installation |
||
✅ Automatically finds new environments with |
||
|
✅ Flexible: works with any virtual environment setup |
❌ More complicated |
✅ Requires only one jupyter installation |
❌ Extra step for each new environment |
You can install all of the extra required packages using pip install -e .[nb]
(if you have a local copy of the source code) or pip install plenoptic[nb]
(if you are installing from PyPI). This includes jupyter, and so is equivalent to method 1 above. See the optional dependencies section for more details.
Running the notebooks
Once you have jupyter installed and the kernel set up, navigate to plenoptic’s examples/
directory on your terminal and activate the environment you installed jupyter into (conda activate plenoptic
for method 1, conda activate base
for methods methods method 2 or 3), then run jupyter
and open up the notebooks. If you followed the second or third method, you should be prompted to select your kernel the first time you open a notebook: select the one named “plenoptic”.
Attention
If you installed plenoptic
from PyPI, then you will not have the notebooks on your machine and will need to download them directly from our GitHub repo. If you have a local install (and thus ran git clone
), then the notebooks can be found in the examples/
directory.
Conceptual Introduction
plenoptic
is a python library for “model-based synthesis of perceptual
stimuli”. If you’ve never heard this phrase before, it may seem mysterious: what
is stimulus synthesis and what types of scientific investigation does it
facilitate?
Synthesis is a framework for exploring models by using them to create new
stimuli, rather than examining their responses to existing ones. plenoptic
focuses on models of visual [1] information processing, which take an image as
input, perform some computations based on parameters, and return some
vector-valued abstract representation as output. This output can be mapped to
neuronal firing rate, fMRI BOLD response, behavior on some task, image category,
etc., depending on the researchers’ intended question.
Schematic describing relationship between simulate, fit, and synthesize.
That is, computational models transform a stimulus \(s\) to a response \(r\) (we often refer to \(r\) as “the model’s representation of \(s\)”), based on some model parameters \(\theta\). For example, a trained neural network that classifies images has specific weights \(\theta\), accepts an image \(s\) and returns a one-hot vector \(r\) that specifies the image class. Another example is a linear-nonlinear oriented filter model of a simple cell in primary visual cortex, where \(\theta\) defines the filter’s orientation, size, and spatial frequency, the model accepts an image \(s\) and returns a scalar \(r\) that represents the neuron’s firing rate.
The most common scientific uses for a model are to simulate responses or to fit parameters, as illustrated in Fig. 1. For simulation, we hold the parameters constant while presenting the model with inputs (e.g, photographs of dogs, or a set of sine-wave gratings) and we run the model to compute responses. For fitting, we use optimization to find the parameter values that best account for the observed responses to a set of training stimuli. In both of these cases, we are holding two of the three variables (\(r\), \(s\), \(\theta\)) constant while computing or estimating the third. We can do the same thing to generate novel stimuli, \(s\), while holding the parameters and responses constant. We refer to this process as synthesis and it facilitates the exploration of input space to improve our understanding of a model’s representations.
This is related to a long and fruitful thread of research in vision science that focuses on what humans cannot see, that is, the information they are insensitive to. Perceptual metamers — images that are physically distinct but perceptually indistinguishable — provide direct evidence of such information loss in visual representations. Color metamers were instrumental in the development of the Young-Helmholtz theory of trichromacy [Helmholtz1852]. In this context, metamers demonstrate that the human visual system projects the infinite dimensionality of the physical signal to three dimensions.
To make this more concrete, let’s walk through an example. Humans can see visible light, which is electromagnetic radiation with wavelengths between 400 and 700 nanometers (nm). We often want to be able to recreate the colors in a natural scene, such as when we take a picture. In order to do so, we can ask: what information do we need to record in order to do so? Let’s start with a solid patch of uniform color. If we wanted to recreate the complete energy spectra of the color, we would need to record a lot of numbers: even if we subsampled the wavelengths so that we only recorded the energy every 5 nm, we would need 61 numbers per color! But we know that most modern electronic screens only use three numbers, often called RGB (red, green, and blue) — why can we get away with throwing away so much information? Trichromacy and color metamers can help explain.
Researchers studying color perception arrived at a standard procedure – the bipartite color-matching experiment – for constraining a model for trichromatic metamers, illustrated in Fig. 2. An observer matches a monochromatic test color (i.e., a light with energy at only a single wavelength) with the physical mixture of three different monochromatic stimuli, called primaries. Thus, the goal is to create two perceptually-indistinguishable stimuli (metamers). Perhaps surprisingly, not only is this possible for any test color, it is also possible for just about any selection of primaries (as long as they’re within the visible light spectrum and sufficiently different from each other). For most human observers, three primaries are required: there are many colors that cannot be matched with only two primaries, and four yields non-unique responses. However, there are some people, for whom two primaries are sufficient.
Color matching experiment
Requiring three primaries for most people, but two for some provided a hint regarding the underlying mechanisms: most people have cone photorecpetors from three distinct classes (generally referred to as S, M, and L, for “short”, “medium”, and “long”). But some forms of color blindness arise from genetic deviations in which only two classes are represented. Color metamers are created when cone responses have been matched. Human cones transform colors from a high-dimensional space (i.e., a vector describing the energy at each wavelength) to a three-dimensional one (i.e., a vector describing how active each cone class is). This means a large amount of wavelength information is discarded.
A worked example may help demonstrate this point more clearly. Let’s match the random light shown on the left below using the primaries shown on the right.
(Source code
, png
, hires.png
, pdf
)

Left: Random light whose appearance we will match. Right: primaries.
The only way we can change the matching light is multiply those primaries by different numbers, moving them up and down. You might look at them and wonder how we can match the light shown on the left, with all its random wiggles. The important point is that we will not match those wiggles. We will instead match the cone activation levels, which we get by matrix multiplying our light by the cone fundamentals, shown below.
(Source code
, png
, hires.png
, pdf
)

Left: the cone sensitivity curves. Right: the response of each cone class to the random light shown in the previous figure.
With some linear algebra, we can compute another light that has very different amounts of energy at each wavelength but identical cone responses, shown below.
(Source code
, png
, hires.png
, pdf
)

If we look at the plot on the left, we can see that the two lights are very different physically, but we can see on the right that they generate the same cone responses and thus would be perceived identically.
In this example, the model was a simple linear system of cone responses, and thus we can generate a metamer, a physically different input with identical output, via some simple linear algebra. Metamers can be useful for understanding other systems as well, because discarding information is useful: the human visual system is discarding information at every stage of processing, not just at the cones’ absorption of light, and any computational system that seeks to classify images must discard a lot of information about unnecessary differences between images in the same class. However, generating metamer for other systems gets complicated: when a system gets more complex, linear algebra no longer suffices.
Let’s consider a slightly more complex example. Human vision is very finely detailed at the center of gaze, but gradually discards this detailed spatial information as distance to the center of gaze increases. This phenomenon is known as foveation, and can be easily seen by the difficulty in reading a paragraph of text or recognizing a face out of the corner of your eye (see [Lettvin1976] for an accessible discussion with examples). The simplest possible model of foveation would be to average pixel intensities in windows whose width grows linearly with distance from the center of an image, as shown in Fig. 7:
The foveated pixel intensity model averages pixel values in elliptical windows that grow in size as you move away from the center of the image. It only cares about the average in these regions, not the fine details.
This model cares about the average pixel intensity in a given area, but doesn’t care how that average is reached. If the pixels in one of the ellipses above all have a value of 0.5, if they’re half 0s and half 1s, if they’re randomly distributed around 0.5 — those are all identical, as far as the model is concerned. A more concrete example is shown in Fig. 8:
Three images that the foveated pixel intensity model considers identical. They all have the same average pixel values within the foveated elliptical regions (the red ellipse shows an example averaging region at that location), but differ greatly in their fine details.
These three images are all identical for the foveated pixel intensity model described above (the red ellipse shows the size of the averaging region at that location). These three images all have identical average pixel intensities in small regions whose size grows as they move away from the center of the image. However, like the color metamers discussed earlier, they are all very physically different: the leftmost image is a natural image, the rightmost one has lots of high-frequency noise, while the center one looks somewhat blurry. You might think that, because the model only cares about average pixel intensities, you can throw away all the fine details and the model won’t notice. And you can! But you can also add whatever kind of fine details you’d like, including random noise — the model is completely insensitive to them.
With relatively simple linear models like human trichromacy and the foveated pixel intensity model, this way of thinking about models may seem unnecessary. But it is very difficult to understand how models will perform on unexpected or out-of-distribution data! The burgeoning literature on adversarial examples and robustness in machine learning provides many of examples of this, such as the addition of a small amount of noise (invisible to humans) changing the predicted category [Szegedy2013] or the addition of a small elephant to a picture completely changing detected objects’ identities and boundaries [Rosenfeld2018]. Exploring model behavior on all possible inputs is impossible — the space of all possible images is far too vast — but image synthesis provides one mechanism for exploration in a targeted manner.
Furthermore, image synthesis provides a complementary method of comparing models to the standard procedure. Generally, scientific models are evaluated on their ability to fit data or perform a task, such as how well a model performs on ImageNet or how closely a model tracks firing rate in some collected data. However, many models can perform a task equally or comparably well [2]. By using image synthesis to explore models’ representational spaces, we can gain a fuller understanding of how models succeed and how they fail to capture the phenomena under study.
Beyond Metamers
plenoptic
contains more than just metamers — it provides a set of methods
for performing image synthesis. Each method allows for different exploration of
a model’s representational space:
Metamers investigate what features the model disregards entirely.
Eigendistortions investigates which features the model considers the least and which it considers the most important
Maximal differentiation (MAD) competition enables efficient comparison of two metrics, highlighting the aspects in which their sensitivities differ.
Geodesics investigates how a model represents motion and what changes to an image it considers reasonable.
The goal of this package is to facilitate model exploration and understanding.
We hope that providing these tools helps tighten the model-experiment loop: when
a model is proposed, whether by importing from a related field or
earlier experiments, plenoptic
enables scientists to make targeted
exploration of the model’s representational space, generating stimuli that will
provide the most information. We hope to help theorists become more active
participants in directing future experiments by efficiently finding new
predictions to test.
Helmholtz, H. (1852). LXXXI. on the theory of compound colours. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 4(28), 519–534. http://dx.doi.org/10.1080/14786445208647175
Lettvin, J. Y. (1976). On Seeing Sidelong. The Sciences, 16(4), 10–20. http://jerome.lettvin.com/jerome/OnSeeingSidelong.pdf
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. https://arxiv.org/abs/1312.6199
Rosenfeld, A., Zemel, R., & Tsotsos, J.~K. (2018). The elephant in the room. https://arxiv.org/abs/1808.03305
Model requirements
plenoptic
provides a model-based synthesis framework, and therefore we
require several things of the models used with the package (the
plenoptic.tools.validate.validate_model()
function provides a convenient
way to check whether your model meets the following requirements, and see
plenoptic.simulate.models
for some examples). Your model:
should inherit
torch.nn.Module
(this is not strictly necessary, but will make meeting the other requirements easier).must be callable, be able to accept a 4d
torch.Tensor
as input, and return a 3d or 4dtorch.Tensor
as output. If you inherittorch.nn.Module
, implementing theforward()
method will make your model callable.the above transformation must be differentiable by
torch
. In practice, this generally means you perform all computations usingtorch
functions (unless you want to write a custom.backward()
method).must not have any learnable parameters. This is largely to save time by avoiding calculation of unnecessary gradients, but synthesis is performed with a fixed model — we are optimizing the input, not the model parameters. You can use the helper function
plenoptic.tools.validate.remove_grad()
to detach all parameters. Similarly, your model should probably be in evaluation mode (i.e., callmodel.eval()
), though this is not strictly required. See the pytorch documentation for the difference between evaluation mode and disabling gradient computation.
Additionally, your model inputs and outputs should be real- or complex-valued and should be interpretable for all possible values (within some range). The intention of stimulus synthesis is to facilitate model understanding — if the synthesized stimulus are meaningless, this defeats the purpose. (Note that domain restrictions, such as requiring integer-valued inputs, can probably be accomplished by adding a penalty to an objective function, but will make your life harder.)
plenoptic.synthesize.mad_competition.MADCompetition
uses metrics,
rather than models, which have the following requirements (use the
plenoptic.tools.validate.validate_metric()
function to check whether your
metric meets the following requirements and see plenoptic.metric
for
some examples):
a metric must be callable, accept two 4d
torch.Tensor
objects as inputs, and return a scalar as output. This can be atorch.nn.Module
object, like models, but the examples metrics are all functions.when called on two identical inputs, the metric must return a value of 0.
Finally, plenoptic.synthesize.metamer.Metamer
supports coarse-to-fine
synthesis, as described in [PS]. To make use of coarse-to-fine synthesis, your
model must meet the following additional requirements (use the
plenoptic.tools.validate.validate_coarse_to_fine()
function to check and
see plenoptic.simulate.models.portilla_simoncelli.PortillaSimoncelli
for an example):
the model must have a
scales
attribute.in addition to a
torch.Tensor
, theforward()
method must also be able to accept an optionalscales
keyword argument (equivalently, when calling the model, if the model does not inherittorch.nn.Module
).that argument should be a list, containing one or more values from
model.scales
, the shape of the output should change whenscales
is a strict subset of all possible values.
Quickstart
The following tutorial is intended to show you how to create a simple plenoptic
-compliant model and use it with our synthesis methods, with a brief explanation of how to interpret the outputs. See the other tutorials for more details.
[1]:
import plenoptic as po
import torch
import pyrtools as pt
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
%matplotlib inline
All plenoptic
methods require a “reference” or “target” image — for Metamer synthesis, for example, this is the image whose representation we will match. Let’s load in an image of Einstein to serve as our reference here:
[2]:
im = po.data.einstein()
fig = po.imshow(im)

Models can be really simple, as this demonstrates. It needs to inherit torch.nn.Module
and just needs two methods: __init__
(so it’s an object) and forward
(so it can take an image). See the Models page of the documentation for more details.
For this notebook, we’ll initialize a simple plenoptic-compatible model and call its forward method. This model just convolves a 2d gaussian filter across an image, so it’s a low-pass model, preserving low frequency information while discarding the high frequencies.
[3]:
# this is a convenience function for creating a simple Gaussian kernel
from plenoptic.simulate.canonical_computations.filters import circular_gaussian2d
# Simple rectified Gaussian convolutional model
class SimpleModel(torch.nn.Module):
# in __init__, we create the object, initializing the convolutional weights and nonlinearity
def __init__(self, kernel_size=(7, 7)):
super().__init__()
self.kernel_size = kernel_size
self.conv = torch.nn.Conv2d(1, 1, kernel_size=kernel_size, padding=(0, 0), bias=False)
self.conv.weight.data[0, 0] = circular_gaussian2d(kernel_size, 3.)
# the forward pass of the model defines how to get from an image to the representation
def forward(self, x):
# use circular padding so our output is the same size as our input
x = po.tools.conv.same_padding(x, self.kernel_size, pad_mode='circular')
return self.conv(x)
model = SimpleModel()
rep = model(im)
/home/billbrod/miniconda3/envs/plenoptic/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
To work with out synthesis methods, a model must accept a 4d tensor as input and return a 3d or 4d tensor as output. 4d inputs are commonly used for pytorch models, and the dimensions are batch (often, multiple images), channel (often, RGB or outputs of different convolutional filters), height, and width. The output should then either return a 1d vector or a 2d image per batch and channel. If your model operates across channels or batches, that’s no problem; for example if the model transforms RGB to grayscale, your input would have 3 channels and your output would have 1.
We can see that our Gaussian
model satisfies this constraint:
[4]:
print(im.shape)
print(rep.shape)
torch.Size([1, 1, 256, 256])
torch.Size([1, 1, 256, 256])
There are also several more abstract constraints (e.g., model must accept real-valued inputs and return real-valued outputs), so it’s recommended that you read the Models page of the documentation before creating your own model.
The following shows the image and the model output. We can see that output is a blurred version of the input, as we would expect from a low-pass model.
[5]:
fig = po.imshow(torch.cat([im, rep]), title=['Original image', 'Model output'])

Before moving forward, let’s think about this model. It’s a simple Gaussian convolution which throws out high-frequency information, as we can see in the representation above. Metamers provide a tool for exploring a model’s insensitivities, so any metamers we synthesize should capitalize on this: they should differ from the original image in the high frequencies.
There’s one final step before we’re ready for synthesis. Most pytorch
models will have learnable parameters, such as the weight on the convolution filter we created above, because the focus is generally on training the model to best perform some task. In plenoptic
, models are fixed because we take the opposite approach: generating some new stimulus to better a understand a given model. Thus, all synthesis methods will raise a ValueError
if given a model with any learnable
parameters. We provide a helper function to remove these gradients:
[6]:
po.tools.remove_grad(model)
Okay, now we’re ready to start with metamer synthesis. To initialize, we only need the model and the image (there are some additional options, but the defaults are fine in this case; see the Metamer notebook if you’re interested). In general, you’ll probably need to play with these options to find a good solution. It’s also probably a good idea, while getting started, to set store_progress
to True
(to store every iteration) or some int
(to store every
int
iterations) so you can examine synthesis progress.
[7]:
metamer = po.synth.Metamer(im, model)
matched_im = metamer.synthesize(store_progress=True, max_iter=20)
# if we call synthesize again, we resume where we left off
matched_im = metamer.synthesize(store_progress=True, max_iter=150)
/home/billbrod/Documents/plenoptic/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
warnings.warn(
/home/billbrod/Documents/plenoptic/plenoptic/synthesize/metamer.py:195: UserWarning: Loss has converged, stopping synthesis
warnings.warn("Loss has converged, stopping synthesis")
We can then examine the loss over time. There’s a convenience function for this, but you could also call plt.semilogy(metamer.losses)
to create it yourself.
[8]:
po.synth.metamer.plot_loss(metamer)
[8]:
<AxesSubplot: xlabel='Synthesis iteration', ylabel='Loss'>

The loss decreases steadily and has reached a very low value. In fact, based on our convergence criterion (one of the optional arguments), it looks as though we’ve converged (we could change this argument to continue synthesis).
We can then look at the reference and metamer images, as well as the model’s outputs on the two images:
[9]:
fig = po.imshow([im, rep, metamer.metamer, model(metamer.metamer)],
col_wrap=2, vrange='auto1',
title=['Original image', 'Model representation\nof original image',
'Synthesized metamer', 'Model representation\nof synthesized metamer']);

We can see that, even though the target and synthesized images look very different, the two model outputs look basically identical (which matches the exceedingly low loss value we see above). (The left column shows the images and the right column the model outputs; top row shows the original image and bottom the synthesized metamer.)
It may seem strange that the synthesized image looks like it has high-frequency noise in it — a Gaussian is a low-pass filter, so why isn’t the model metamer just a blurred version of the original image? Indeed, such a blurred image would be a model metamer, but it’s only one of many. Remember what we mentioned earlier: Gaussians are insensitive to high-frequency information, which not only means that their response doesn’t change when you remove that information, but that you can put any amount of high frequency information into an image without affecting the model’s output. Put another way, you can randomize the contents of the model’s null space without affecting its response, and the goal of metamer synthesis is to generate different images that do just that.
We can also view a movie of our progress so far.
[10]:
po.tools.convert_anim_to_html(po.synth.metamer.animate(metamer, included_plots=['display_metamer', 'plot_loss'], figsize=(12, 5)))
[10]:
We can see the model’s insensitivity to high frequencies more dramatically by initializing our metamer synthesis with a different image. By default, we initialize with a patch of white noise, but we can initialize with any image of the same size. Let’s try with a different natural image, a picture of Marie Curie.
[11]:
curie = po.data.curie()
po.imshow([curie]);

[12]:
metamer = po.synthesize.Metamer(im, model, initial_image=curie, )
# we increase the length of time we run synthesis and decrease the
# stop_criterion, which determines when we think loss has converged
# for stopping synthesis early.
synth_image = metamer.synthesize(max_iter=500, stop_criterion=1e-6)
/home/billbrod/Documents/plenoptic/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
warnings.warn(
/home/billbrod/Documents/plenoptic/plenoptic/synthesize/metamer.py:195: UserWarning: Loss has converged, stopping synthesis
warnings.warn("Loss has converged, stopping synthesis")
Let’s double-check that our synthesis looks like it’s reached a good solution by checking the loss curve:
[13]:
po.synth.metamer.plot_loss(metamer)
[13]:
<AxesSubplot: xlabel='Synthesis iteration', ylabel='Loss'>

Good, now let’s examine our synthesized metamer and the model output, as before:
[14]:
fig = po.imshow([im, rep, metamer.metamer, model(metamer.metamer)],
col_wrap=2, vrange='auto1',
title=['Original image', 'Model representation\nof original image',
'Synthesized metamer', 'Model representation\nof synthesized metamer']);

We see that the synthesized metamer here looks quite different from both the original and from our previous metamer, while the model outputs look very similar. Here, our synthesized model metamer looks like a blurry picture of Einstein with a high-frequency “shadow” of Curie added on top. Again, this is because the Gaussian model is insensitive to high frequencies, and thus a model metamer can include any high frequency information.
By generating model metamers, we’ve gained a better understanding of the information our model is invariant to, but what if we want a better understanding of what our model is sensitive to? We can use Eigendistortion
for that.
Like Metamer
, Eigendistortion
accepts an image and a model as its inputs. By default, it synthesizes the top and bottom eigendistortion, that is, the changes to the input image that the model finds most and least noticeable.
[15]:
eig = po.synthesize.Eigendistortion(im, model)
eig.synthesize();
Initializing Eigendistortion -- Input dim: 65536 | Output dim: 65536
Let’s examine those distortions:
[16]:
po.imshow(eig.eigendistortions, title=['Maximum eigendistortion',
'Minimum eigendistortion']);

We can see they make sense: the most noticeable distortion is a very low-frequency modification to the image, with a period of about half the image. The least noticeable, on the other hand, is very high-frequency, which matches our understanding from the metamer example above.
This brief introduction hopefully demonstrates how you can use plenoptic to better understand your model representations! There’s much more that can be done with both these methods, as well as two additional methods, MADCompetition
and Geodesic
, to explore.
Citation Guide
If you use plenoptic
in a published academic article or presentation, please
cite both the code by the DOI as well [VSS2023]. You can use the following:
Paper:
@article{duong2023plenoptic, title={Plenoptic: A platform for synthesizing model-optimized visual stimuli}, author={Duong, Lyndon and Bonnen, Kathryn and Broderick, William and Fiquet, Pierre-{\'E}tienne and Parthasarathy, Nikhil and Yerxa, Thomas and Zhao, Xinyuan and Simoncelli, Eero}, journal={Journal of Vision}, volume={23}, number={9}, pages={5822--5822}, year={2023}, publisher={The Association for Research in Vision and Ophthalmology} }
Additionally, please cite the following paper(s) depending on which component you use:
plenoptic.synthesize.metamer.Metamer
: orplenoptic.synthesize.metamer.MetamerCTF
: [Portilla2000].plenoptic.synthesize.mad_competition.MADCompetition
: [Wang2008].plenoptic.synthesize.eigendistortion.Eigendistortion
: [Berardino2017].plenoptic.simulate.canonical_computations.steerable_pyramid_freq.SteerablePyramidFreq
: [Simoncelli1995] ([Simoncelli1992] contains a longer discussion about the motivation and the logic, while [Simoncelli1995] describes the implementation that is used here).plenoptic.simulate.models.portilla_simoncelli.PortillaSimoncelli
: [Portilla2000].plenoptic.simulate.models.frontend
(any model): [Berardino2017].plenoptic.metric.perceptual_distance.ssim
orplenoptic.metric.perceptual_distance.ssim_map
: [Wang2004] ifweighted=False
, [Wang2008] ifweighted=True
.
Note that, the citations given above define the application of the relevant idea
(“metamers”) to computational models of the visual system that are instantiated
in the algorithms found in plenoptic
, but that, for the most part, these
general concepts were not developed by the developers of plenoptic
or the
Simoncelli lab and are, in general, much older – the idea of metamers goes all
the way back to [Helmholtz1852]! The papers above generally provide some
discussion of this history and can point you to further reading, if you are
interested.
Lyndon Duong, Kathryn Bonnen, William Broderick, Pierre-Étienne Fiquet, Nikhil Parthasarathy, Thomas Yerxa, Xinyuan Zhao, Eero Simoncelli; Plenoptic: A platform for synthesizing model-optimized visual stimuli. Journal of Vision 2023;23(9):5822. https://doi.org/10.1167/jov.23.9.5822.
Eigendistortions
Run notebook online with Binder:
In this tutorial we will cover:
theory behind eigendistortions
how to use the
plenoptic.synthesize.eigendistortion.Eigendistortion
objectcomputing eigendistortions using a simple input and linear model
computing extremal eigendistortions for different layers of ResNet18
Introduction
How can we assess whether a model sees like we do? One way is to test whether they “notice” image distortions the same way as us. For a model, a noticeable distortion would be an image perturbation that elicits a change in its response. If our goal is to create models with human-like vision, then an image distortion that is (not) noticeable to a human should also (not) be noticeable to our models. Eigendistortions provide a framework with which to compare models to human visual perception of distortions.
Berardino, A., Laparra, V., Ballé, J. and Simoncelli, E., 2017. Eigen-distortions of hierarchical representations. In Advances in neural information processing systems (pp. 3530-3539).
http://www.cns.nyu.edu/pub/lcv/berardino17c-final.pdf
http://www.cns.nyu.edu/~lcv/eigendistortions/
See the last section of this notebook for more mathematical detail
[1]:
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
import torch
from plenoptic.synthesize.eigendistortion import Eigendistortion
from torch import nn
# this notebook uses torchvision, which is an optional dependency.
# if this fails, install torchvision in your plenoptic environment
# and restart the notebook kernel.
try:
from torchvision import models
except ModuleNotFoundError:
raise ModuleNotFoundError("optional dependency torchvision not found!"
" please install it in your plenoptic environment "
"and restart the notebook kernel")
import os.path as op
import plenoptic as po
Example 1: Linear model, small 1D input “image”
1.1) Creating the model
The fundamental goal of computing eigendistortions is to understand how small changes (distortions) in inputs affect model outputs. Any model can be thought of as a black box mapping an input to an output, \(f(x): x \in \mathbb{R}^n \mapsto y \in \mathbb{R}^m\), i.e. a function takes as input an n-dimensional vector \(x\) and outputs an m-dimensional vector \(y\).
The simplest model that achieves this is linear,
- :nbsphinx-math:`begin{align}
y &= f(x) = Mx, && text{$Min mathbb{R^{mtimes n}}$}.
end{align}`
In this linear case, the Jacobian is fixed \(J= \frac{\partial f}{\partial x}=M\) for all possible inputs \(x\). Can we synthesize a distortion \(\epsilon\) such that \(f(x+\epsilon)\) is maximally/minimally perturbed from the original \(f(x)\)? Yes! This would amount to finding the first and last eigenvectors of the Fisher information matrix, i.e. \(J^TJ v = \lambda v\).
A few things to note:
Input image should always be a 4D tensor whose dimensions
torch.Size([batch=1, channel, height, width])
.We don’t allow for batch synthesis of eigendistortions so the batch dim should always = 1
We’ll be working with the Eigendistortion
object and its instance method, synthesize()
.
Let’s make a linear PyTorch model and compute eigendistortions for a given input.
[2]:
class LinearModel(nn.Module):
"""The simplest model we can make.
Its Jacobian should be the weight matrix of M, and the eigenvectors of the Fisher matrix are therefore the
eigenvectors of M.T @ M"""
def __init__(self, n, m):
super(LinearModel, self).__init__()
torch.manual_seed(0)
self.M = nn.Linear(n, m, bias=False)
def forward(self, x):
y = self.M(x) # this computes y = x @ M.T
return y
n = 25 # input vector dim (can you predict what the eigenvec/vals would be when n<m or n=m? Feel free to try!)
m = 10 # output vector dim
mdl_linear = LinearModel(n, m)
po.tools.remove_grad(mdl_linear)
x0 = torch.ones((1, 1, 1, n)) # input must betorch.Size([batch=1, n_chan, img_height, img_width])
y0 = mdl_linear(x0)
fig, ax = plt.subplots(2, 1, sharex='all', sharey='all')
ax[0].stem(x0.squeeze())
ax[0].set(title=f'{n:d}D Input')
ax[1].stem(y0.squeeze().detach(), markerfmt='C1o')
ax[1].set(title=f'{m:d}D Output')
fig.tight_layout()

1.2 - Synthesizing eigendistortions of linear model
To compute the eigendistortions of this model, we can instantiate an Eigendistortion
object with a 4D input image with dims torch.Size([batch=1, n_channels, img_height, img_width])
, and any PyTorch model with valid forward
and backward
methods. After that, we simply call the instance method synthesize()
and choose the appropriate synthesis method. Normally our input has thousands of entries, but our input in this case is small (only n=25 entries), so we can compute the full
\(m \times n\) Jacobian, and all the eigenvectors of the \(n \times n\) Fisher matrix, \(F=J^TJ\). The synthesize
method does this for us and stores the outputs (eigendistortions, eigenvalues, eigenindex) of the synthesis. These return values point to synthesized_signal
, synthesized_eigenvalues
, synthesized_eigenindex
attributes of the object, respectively.
[3]:
help(Eigendistortion.synthesize) # fully documented
eig_jac = Eigendistortion(x0, mdl_linear) # instantiate Eigendistortion object using an input and model
eig_jac.synthesize(method='exact') # compute the entire Jacobian exactly
Help on function synthesize in module plenoptic.synthesize.eigendistortion:
synthesize(self, method: Literal['exact', 'power', 'randomized_svd'] = 'power', k: int = 1, max_iter: int = 1000, p: int = 5, q: int = 2, stop_criterion: float = 1e-07)
Compute eigendistortions of Fisher Information Matrix with given input image.
Parameters
----------
method
Eigensolver method. 'exact' tries to do eigendecomposition directly (
not recommended for very large inputs). 'power' (default) uses the power method to compute first and
last eigendistortions, with maximum number of iterations dictated by n_steps. 'randomized_svd' uses
randomized SVD to approximate the top k eigendistortions and their corresponding eigenvalues.
k
How many vectors to return using block power method or svd.
max_iter
Maximum number of steps to run for ``method='power'`` in eigenvalue computation. Ignored
for other methods.
p
Oversampling parameter for randomized SVD. k+p vectors will be sampled, and k will be returned. See
docstring of ``_synthesize_randomized_svd`` for more details including algorithm reference.
q
Matrix power parameter for randomized SVD. This is an effective trick for the algorithm to converge to
the correct eigenvectors when the eigenspectrum does not decay quickly. See
``_synthesize_randomized_svd`` for more details including algorithm reference.
stop_criterion
Used if ``method='power'`` to check for convergence. If the L2-norm
of the eigenvalues has changed by less than this value from one
iteration to the next, we terminate synthesis.
Initializing Eigendistortion -- Input dim: 25 | Output dim: 10
Computing all eigendistortions
/home/billbrod/Documents/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
warnings.warn(
1.3 - Comparing our synthesis to ground-truth
The Jacobian is in general a rectangular (not necessarily square) matrix \(J\in \mathbb{R}^{m\times n}\). Since this is a linear model, let’s check if the computed Jacobian (stored as an attribute in the Eigendistortion
object) matches the weight matrix \(M\).
Since the eigendistortions are each 1D (vectors) in this example, we can display them all as an image where each column is an eigendistortion, each pixel is an entry of the eigendistortion, and the intensity is proportional to its value.
[4]:
fig, ax = plt.subplots(1, 2, sharex='all', sharey='all')
ax[0].imshow(eig_jac.jacobian)
ax[1].imshow(mdl_linear.M.weight.data, vmin=eig_jac.jacobian.min(), vmax=eig_jac.jacobian.max())
ax[0].set(xticks=[], yticks=[], title='Solved Jacobian')
ax[1].set(title='Linear model weight matrix')
fig.tight_layout()
print("Jacobian == weight matrix M?:", eig_jac.jacobian.allclose(mdl_linear.M.weight.data))
# Eigenvectors (aka eigendistortions) and associated eigenvectors are found in the distortions dict attribute
fig, ax = plt.subplots(1, 2, sharex='all')
ax[0].imshow(eig_jac.eigendistortions.squeeze(), vmin=-1, vmax=1, cmap='coolwarm')
ax[0].set(title='Eigendistortions', xlabel='Eigenvector index', ylabel='Entry')
ax[1].plot(eig_jac.eigenvalues, '.')
ax[1].set(title='Eigenvalues', xlabel='Eigenvector index', ylabel='Eigenvalue')
fig.tight_layout()
Jacobian == weight matrix M?: True


1.4 - What do these eigendistortions mean?
The first eigenvector (with the largest eigenvalue) is the direction in which we can distort our input \(x\) and change the response of the model the most, i.e. its most noticeable distortion. For the last eigenvector, since its associated eigenvalue is 0, then no change in response occurs when we distort the input in that direction, i.e. \(f(x+\epsilon)=f(x)\). So this distortion would be imperceptible to the model.
In most cases, our input would be much larger. An \(n\times n\) image has \(n^2\) entries, meaning the Fisher matrix is \(n^2 \times n^2\), and therefore \(n^2\) possible eigendistortions – certainly too large to store in memory. We need to instead resort to numerical methods to compute the eigendistortions. To do this, we can just set our synthesis method='power'
to estimate the first eigenvector (most noticeable distortion) and last eigenvector (least noticeable
distortion) for the image.
[5]:
eig_pow = Eigendistortion(x0, mdl_linear)
eig_pow.synthesize(method='power', max_iter=1000)
eigdist_pow = eig_pow.eigendistortions.squeeze() # squeeze out singleton channel dimension (these are grayscale)
eigdist_jac = eig_jac.eigendistortions.squeeze()
print(f'Indices of computed eigenvectors: {eig_pow.eigenindex}\n')
fig, ax = plt.subplots(1,1)
ax.plot(eig_pow.eigenindex, eig_pow.eigenvalues, '.', markersize=15, label='Power')
ax.plot(eig_jac.eigenvalues, '.-', label='Jacobian')
ax.set(title='Power method vs Jacobian', xlabel='Eigenvector index', ylabel='Eigenvalue')
ax.legend(title='Synth. method')
fig, ax = plt.subplots(1, 2, sharex='all', sharey='all', figsize=(8,3))
ax[0].plot(eigdist_pow[0] - eigdist_jac[0])
ax[0].set(title='Difference in first eigendists')
ax[1].stem(eigdist_pow[-1] - eigdist_jac[-1])
ax[1].set(title='Difference in last eigendists')
fig, ax = plt.subplots(1,1)
ax.stem(eigdist_jac @ eigdist_pow[-1])
ax.set(title="Power method's last eigenvec projected on all Jacobian method's eigenvec",
xlabel='Eigenvector index', ylabel='Projection')
print('Are the first eigendistortions the same?', eigdist_pow[0].allclose(eigdist_jac[0], atol=1e-3))
print('Are the last eigendistortions the same?', eigdist_pow[-1].allclose(eigdist_jac[-1], atol=1e-3))
# find eigendistortions of Jacobian-method whose eigenvalues are zero
ind_zero = eig_jac.eigenvalues.isclose(torch.zeros(1), atol=1e-4)
Initializing Eigendistortion -- Input dim: 25 | Output dim: 10
Top k=1 eigendists computed | Stop criterion 1.00E-07 reached.
Bottom k=1 eigendists computed | Stop criterion 1.00E-07 reached.
Indices of computed eigenvectors: tensor([ 0, 24])
Are the first eigendistortions the same? True
Are the last eigendistortions the same? False



The power method’s first eigendistortion matches the ground-truth first eigendistortion obtained via the Jacobian solve. And while the last eigendistortions don’t match, the last power method eigendistortion lies in the span of all the eigendistortions whose eigenvalues are zero. Each of these eigendistortions whose eigenvalues are zero are equivalent. Any distortion of \(x\) in the span of these eigendistortions would result in no change in the model output, and would therefore be imperceptible to the model.
1.5 - The Fisher information matrix is a locally adaptive
Different inputs should in general have different sets of eigendistortions – a noticible distortion in one image would not necessarily be noticeable in a different image. The only case where they should be the same regardless of input is when the model is fully linear, as in this simple example. So let’s check if the Jacobian at a different input still equals the weight matrix \(M\).
[6]:
x1 = torch.randn_like(x0) # generate some random input
eig_jac2 = Eigendistortion(x1, model=mdl_linear)
eig_jac2.synthesize(method='exact') # since the model is linear, the Jacobian should be the exact same as before
print(f'Does the jacobian at x1 still equal the model weight matrix?'
f' {eig_jac2.jacobian.allclose(mdl_linear.M.weight.data)}')
Initializing Eigendistortion -- Input dim: 25 | Output dim: 10
Computing all eigendistortions
Does the jacobian at x1 still equal the model weight matrix? True
Example 2: Which layer of ResNet is a better model of human visual distortion perception?
Now that we understand what eigendistortions are and how the Eigendistortion
class works, let’s compute them real images using a more complex model like Vgg16. The response vector \(y\) doesn’t necessarily have to be the output of the last layer of the model; we can also compute Eigendistortions for intermediate model layers too. Let’s synthesize distortions for an image using different layers of Vgg16 to see which layer produces extremal eigendistortions that align more with human
perception.
2.1 - Load an example an image
[10]:
n = 128 # this will be the img_height and width of the input, you can change this to accommodate your machine
img = po.data.color_wheel()
# center crop the image to nxn
img = po.tools.center_crop(img, n)
po.imshow(img, as_rgb=True, zoom=3);

2.2 - Instantiate models and Eigendistortion objects
Let’s make a wrapper class that can return the nth layer output of vgg. We’re going to use this to compare eigendistortions synthesized using different layers of Vgg as models for distortion perception.
[11]:
# Create a class that takes the nth layer output of a given model
class NthLayer(torch.nn.Module):
"""Wrap any model to get the response of an intermediate layer
Works for Resnet18 or VGG16.
"""
def __init__(self, model, layer=None):
"""
Parameters
----------
model: PyTorch model
layer: int
Which model response layer to output
"""
super().__init__()
try:
# then this is VGG16
features = list(model.features)
except AttributeError:
# then it's resnet18
features = ([model.conv1, model.bn1, model.relu, model.maxpool] + [l for l in model.layer1] +
[l for l in model.layer2] + [l for l in model.layer3] + [l for l in model.layer4] +
[model.avgpool, model.fc])
self.features = nn.ModuleList(features).eval()
if layer is None:
layer = len(self.features)
self.layer = layer
def forward(self, x):
for ii, mdl in enumerate(self.features):
x = mdl(x)
if ii == self.layer:
return x
# different potential models of human visual perception of distortions
resnet18_a = NthLayer(models.resnet18(pretrained=True), layer=3)
po.tools.remove_grad(resnet18_a)
resnet18_b = NthLayer(models.resnet18(pretrained=True), layer=6)
po.tools.remove_grad(resnet18_b)
ed_resneta = Eigendistortion(img, resnet18_a)
ed_resnetb = Eigendistortion(img, resnet18_b)
/home/billbrod/micromamba/envs/plenoptic/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/billbrod/micromamba/envs/plenoptic/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Initializing Eigendistortion -- Input dim: 49152 | Output dim: 65536
Initializing Eigendistortion -- Input dim: 49152 | Output dim: 32768
2.3 - Synthesizing distortions
The input dimensionality in this example is huge compared to our linear model example – it is \((\textrm{n_chan} \times \textrm{img_height} \times \textrm{img_width})^2\), meaning the Fisher matrix is too massive to compute exactly. We must turn to iterative methods. Let’s synthesize the extremal eigendistortions for this picture of Einstein using the different layers of ResNet as defined above.
[12]:
# Bump up n_steps if you wish
ed_resneta.synthesize(method='power', max_iter=400)
ed_resnetb.synthesize(method='power', max_iter=400);
Top k=1 eigendists computed | Stop criterion 1.00E-07 reached.
Top k=1 eigendists computed | Stop criterion 1.00E-07 reached.
2.4 - Visualizing eigendistortions
Let’s display the eigendistortions. Eigendistortion
has an instance method display
that will display a 2x3 subplot figure of images. The top row shows the original image on the left, the synthesized maximal eigendistortion on the right, and some constsant \(\alpha\) times the eigendistortion added to the image in the middle panel. The bottow row has a similar layout, but displays the minimal eigendistortion. Let’s display the eigendistortions for both models.
[13]:
po.synth.eigendistortion.display_eigendistortion(ed_resneta, 0, as_rgb=True, zoom=3);
po.synth.eigendistortion.display_eigendistortion(ed_resneta, -1, as_rgb=True, zoom=3);


2.5 - Which synthesized extremal eigendistortions better characterize human perception?
Let’s compare eigendistortions within a model first. One thing we immediately notice is that the first eigendistortion (labeled maxdist
) is indeed more noticeable than mindist
. maxdist
is localized to a single portion of the image, and has lower, more prominent spatial frequency content than mindist
. mindist
looks more like high frequency noise distributed across the image.
But how do the distortions compare between models – which model better characterizes human visual perception of distortions? The only way to truly this is to run an experiment and ask human observers which distortions are most/least noticeable to them. The best model should produce a maximally noticeable distortion that is more noticeable than other models’ maximally noticeable distortions, and its minimally noticeable distortion should be less noticeable than other models’ minimally noticeable distortions.
See Berardino et al. 2017 for more details.
2.6 - Synthesizing distortions for other images
Remember the Fisher matrix is locally adaptive, meaning that a different image should have a different set of eigendistortions. Let’s finish off this notebook with another set of extremal eigendistortions for these two Vgg16 layers on a different image.
[14]:
img = po.data.curie()
# center crop the image to nxn
img = po.tools.center_crop(img, n)
# because this is a grayscale image but ResNet expects a color image,
# need to duplicate along the color dimension
img3 = torch.repeat_interleave(img, 3, dim=1)
ed_resneta = Eigendistortion(img3, resnet18_a)
ed_resnetb = Eigendistortion(img3, resnet18_b)
ed_resneta.synthesize(method='power', max_iter=400)
ed_resnetb.synthesize(method='power', max_iter=400)
po.imshow(img, zoom=2, title="Original");
Initializing Eigendistortion -- Input dim: 49152 | Output dim: 65536
Initializing Eigendistortion -- Input dim: 49152 | Output dim: 32768
Top k=1 eigendists computed | Stop criterion 1.00E-07 reached.
Top k=1 eigendists computed | Stop criterion 1.00E-07 reached.

[15]:
po.synth.eigendistortion.display_eigendistortion(ed_resneta, 0, as_rgb=True, zoom=2, title="top eigendist");
po.synth.eigendistortion.display_eigendistortion(ed_resneta, -1, as_rgb=True, zoom=2, title="bottom eigendist");
po.synth.eigendistortion.display_eigendistortion(ed_resnetb, 0, as_rgb=True, zoom=2, title="top eigendist");
po.synth.eigendistortion.display_eigendistortion(ed_resnetb, -1, as_rgb=True, zoom=2, title="bottom eigendist");




Appendix: More mathematical detail
If we have a model that takes an N-dimensional input and outputs an M-dimensional response, then its Jacobian, \(J=\frac{\partial f}{\partial x}\), is an \(M\times N\) matrix of partial derivatives that tells us how much a change in each entry of the input would change each entry of the output. With the assumption of additive Gaussian noise in the output space Fisher Information Matrix, \(F\), is a symmetric positive semi-definite, \(N\times N\) matrix computed using the Jacobian, \(F=J^TJ\). If you are familiar with linear algebra, you might notice that the eigenvectors of \(F\) are the right singular vectors of the Jacobian. Thus, an eigendecomposition \(F=V\Lambda V\) yields directions of the input space (vectors in \(V\)) along which changes in the output space are rank-ordered by entries in diagonal matrix \(\Lambda\).
Given some input image \(x_0\), an eigendistortion is an additive perturbation, \(\epsilon\), in the input domain that changes the response in a model’s output domain of interest (e.g. an intermediate layer of a neural net, the output of a nonlinear model, etc.). These perturbations are named eigendistortions because they push \(x_0\) along eigenvectors of the Fisher Information Matrix. So we expect distortions \(x_0\) along the direction of the eigenvector with the maximum eigenvalue will change the representation the most, and distortions along the eigenvector with the minimum eigenvalue will change the representation the least. (And pushing along intermediate eigenvectors will change the representation by an intermediate amount.)
Representational Geodesic
NOTE: This notebook and the geodesic method are still under construction and subject to change. They will run, but might not find the most informative geodesic.
[1]:
import numpy as np
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
%matplotlib inline
import pyrtools as pt
import plenoptic as po
from plenoptic.tools import to_numpy
%load_ext autoreload
%autoreload 2
import torch
import torch.nn as nn
# this notebook uses torchvision, which is an optional dependency.
# if this fails, install torchvision in your plenoptic environment
# and restart the notebook kernel.
try:
import torchvision
except ModuleNotFoundError:
raise ModuleNotFoundError("optional dependency torchvision not found!"
" please install it in your plenoptic environment "
"and restart the notebook kernel")
import torchvision.transforms as transforms
from torchvision import models
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dtype = torch.float32
torch.__version__
/home/billbrod/miniconda3/envs/plen_3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
[1]:
'2.0.1+cu117'
Translation
[2]:
image_size = 64
einstein = po.data.einstein()
einstein = po.tools.conv.blur_downsample(einstein, n_scales=2)
vid = po.tools.translation_sequence(einstein, n_steps=20)
vid = po.tools.center_crop(vid, image_size // 2)
vid = po.tools.rescale(vid, 0, 1)
imgA = vid[0:1]
imgB = vid[-1:]
pt.image_stats(to_numpy(imgA))
pt.image_stats(to_numpy(imgB))
print(imgA.shape)
print(vid.shape)
# convention: full name for numpy arrays, short hands for torch tensors
video = to_numpy(vid).squeeze()
print(video.shape)
pt.imshow(list(video.squeeze()), zoom=4, col_wrap=6);
Image statistics:
Range: [0.079997, 1.000000]
Mean: 0.488417, Stdev: 0.149090, Kurtosis: 3.337172
Image statistics:
Range: [0.000000, 0.741736]
Mean: 0.354389, Stdev: 0.212748, Kurtosis: 1.725743
torch.Size([1, 1, 32, 32])
torch.Size([21, 1, 32, 32])
(21, 32, 32)

Spectral models
Computing a geodesic to reveal excess invariance of the global Fourier magnitude representation.
[3]:
import torch.fft
class Fourier(nn.Module):
def __init__(self, representation = 'amp'):
super().__init__()
self.representation = representation
def spectrum(self, x):
return torch.fft.rfftn(x, dim=(2, 3))
def forward(self, x):
if self.representation == 'amp':
return torch.abs(self.spectrum(x))
elif self.representation == 'phase':
return torch.angle(self.spectrum(x))
elif self.representation == 'rectangular':
return self.spectrum(x)
elif self.representation == 'polar':
return torch.cat((torch.abs(self.spectrum(x)),
torch.angle(self.spectrum(x))),
dim=1)
model = Fourier('amp')
# model = Fourier('polar') # note: need pytorch>=1.8 to take gradients through torch.angle
[4]:
n_steps = len(video)-1
moog = po.synth.Geodesic(imgA, imgB, model, n_steps, initial_sequence='bridge')
optim = torch.optim.Adam([moog._geodesic], lr=.01, amsgrad=True)
moog.synthesize(max_iter=500, optimizer=optim, store_progress=True)
/home/billbrod/Documents/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
warnings.warn(
Stop criterion for pixel_change_norm = 1.07149e-02
24%|████▎ | 119/500 [00:00<00:01, 202.78it/s, loss=8.3254e+01, gradient norm=1.0443e-01, pixel change norm=1.08467e-03]/home/billbrod/Documents/plenoptic/src/plenoptic/synthesize/geodesic.py:193: UserWarning: Pixel change norm has converged, stopping synthesis
warnings.warn("Pixel change norm has converged, stopping synthesis")
26%|████▋ | 130/500 [00:00<00:01, 204.98it/s, loss=8.3254e+01, gradient norm=1.0443e-01, pixel change norm=1.08467e-03]
[5]:
fig, axes = plt.subplots(2, 1, figsize=(5, 8))
po.synth.geodesic.plot_loss(moog, ax=axes[0]);
po.synth.geodesic.plot_deviation_from_line(moog, vid, ax=axes[1]);

[6]:
plt.plot(po.to_numpy(moog.step_energy), alpha=.2);
plt.plot(moog.step_energy.mean(1), 'r-', label='path energy')
plt.axhline(torch.linalg.vector_norm(moog.model(moog.image_a) - moog.model(moog.image_b), ord=2) ** 2 / moog.n_steps ** 2)
plt.legend()
plt.title('evolution of representation step energy')
plt.ylabel('step energy')
plt.xlabel('iteration')
plt.yscale('log')
plt.show()

[7]:
plt.plot(moog.calculate_jerkiness().detach())
plt.title('final representation step jerkiness')
[7]:
Text(0.5, 1.0, 'final representation step jerkiness')

[8]:
plt.plot(po.to_numpy(moog.dev_from_line[..., 1]));
plt.title('evolution of distance from representation line')
plt.ylabel('distance from representation line')
plt.xlabel('iteration step')
plt.show()

[9]:
pixelfade = to_numpy(moog.pixelfade.squeeze())
geodesic = to_numpy(moog.geodesic.squeeze())
fig = pt.imshow([video[5], pixelfade[5], geodesic[5]],
title=['video', 'pixelfade', 'geodesic'],
col_wrap=3, zoom=4);
size = geodesic.shape[-1]
h, m , l = (size//2 + size//4, size//2, size//2 - size//4)
# for a in fig.get_axes()[0]:
a = fig.get_axes()[0]
for line in (h, m, l):
a.axhline(line, lw=2)
pt.imshow([video[:,l], pixelfade[:,l], geodesic[:,l]],
title=None, col_wrap=3, zoom=4);
pt.imshow([video[:,m], pixelfade[:,m], geodesic[:,m]],
title=None, col_wrap=3, zoom=4);
pt.imshow([video[:,h], pixelfade[:,h], geodesic[:,h]],
title=None, col_wrap=3, zoom=4);




Physiologically inspired models
[10]:
model = po.simul.OnOff(kernel_size=(31,31), pretrained=True)
po.tools.remove_grad(model)
po.imshow(model(imgA), zoom=8);
/home/billbrod/Documents/plenoptic/src/plenoptic/simulate/models/frontend.py:388: UserWarning: pretrained is True but cache_filt is False. Set cache_filt to True for efficiency unless you are fine-tuning.
warn("pretrained is True but cache_filt is False. Set cache_filt to "
/home/billbrod/miniconda3/envs/plen_3.10/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]

[11]:
n_steps = 10
moog = po.synth.Geodesic(imgA, imgB, model, n_steps, initial_sequence='bridge')
[12]:
moog.synthesize(store_progress=True)
Stop criterion for pixel_change_norm = 7.76675e-03
18%|███▏ | 178/1000 [00:19<01:31, 9.00it/s, loss=6.3326e-03, gradient norm=3.1056e-05, pixel change norm=5.54276e-03]
[13]:
fig, axes = plt.subplots(2, 1, figsize=(5, 8))
po.synth.geodesic.plot_loss(moog, ax=axes[0]);
po.synth.geodesic.plot_deviation_from_line(moog, ax=axes[1]);

[14]:
plt.plot(po.to_numpy(moog.dev_from_line[...,0]))
plt.title('evolution of distance from representation line')
plt.ylabel('distance from representation line')
plt.xlabel('iteration step')
plt.yscale('log')
plt.show()

[15]:
plt.plot(po.to_numpy(moog.step_energy), alpha=.2);
plt.plot(moog.step_energy.mean(1), 'r-', label='path energy')
plt.axhline(torch.linalg.vector_norm(moog.model(moog.image_a) - moog.model(moog.image_b), ord=2) ** 2 / moog.n_steps ** 2)
plt.legend()
plt.title('evolution of representation step energy')
plt.ylabel('step energy')
plt.xlabel('iteration')
plt.yscale('log')
plt.show()

[16]:
plt.plot(moog.calculate_jerkiness().detach())
plt.title('final representation step jerkiness')
[16]:
Text(0.5, 1.0, 'final representation step jerkiness')

[17]:
geodesic = po.to_numpy(moog.geodesic).squeeze()
pixelfade = po.to_numpy(moog.pixelfade).squeeze()
assert geodesic.shape == pixelfade.shape
geodesic.shape
[17]:
(11, 32, 32)
[18]:
print('geodesic')
pt.imshow(list(geodesic), vrange='auto1', title=None, zoom=4);
print('diff')
pt.imshow(list(geodesic - pixelfade), vrange='auto1', title=None, zoom=4);
print('pixelfade')
pt.imshow(list(pixelfade), vrange='auto1', title=None, zoom=4);
geodesic
diff
pixelfade



[19]:
# checking that the range constraint is met
plt.hist(video.flatten(), histtype='step', density=True, label='video')
plt.hist(pixelfade.flatten(), histtype='step', density=True, label='pixelfade')
plt.hist(geodesic.flatten(), histtype='step', density=True, label='geodesic');
plt.title('signal value histogram')
plt.legend(loc=1)
plt.show()

vgg16 translation / rotation / scaling
[20]:
# We have some optional example images that we'll download for this. In order to do so,
# we use an optional dependency, pooch. If the following raises an ImportError or ModuleNotFoundError for you,
# then install pooch in your plenoptic environment and restart your kernel.
sample_image_dir = po.data.fetch_data('sample_images.tar.gz')
imgA = po.load_images(sample_image_dir / 'frontwindow_affine.jpeg', as_gray=False)
imgB = po.load_images(sample_image_dir / 'frontwindow.jpeg', as_gray=False)
u = 300
l = 90
imgA = imgA[..., u:u+224, l:l+224]
imgB = imgB[..., u:u+224, l:l+224]
po.imshow([imgA, imgB], as_rgb=True);
diff = imgA - imgB
po.imshow(diff);
pt.image_compare(po.to_numpy(imgA, True), po.to_numpy(imgB, True));
Difference statistics:
Range: [0, 0]
Mean: -0.012635, Stdev (rmse): 0.208685, SNR (dB): 0.856129


[21]:
from torchvision import models
# Create a class that takes the nth layer output of a given model
class NthLayer(torch.nn.Module):
"""Wrap any model to get the response of an intermediate layer
Works for Resnet18 or VGG16.
"""
def __init__(self, model, layer=None):
"""
Parameters
----------
model: PyTorch model
layer: int
Which model response layer to output
"""
super().__init__()
# TODO
# is centrering appropriate???
self.normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
try:
# then this is VGG16
features = list(model.features)
except AttributeError:
# then it's resnet18
features = ([model.conv1, model.bn1, model.relu, model.maxpool] + [l for l in model.layer1] +
[l for l in model.layer2] + [l for l in model.layer3] + [l for l in model.layer4] +
[model.avgpool, model.fc])
self.features = nn.ModuleList(features).eval()
if layer is None:
layer = len(self.features)
self.layer = layer
def forward(self, x):
x = self.normalize(x)
for ii, mdl in enumerate(self.features):
x = mdl(x)
if ii == self.layer:
return x
# different potential models of human visual perception of distortions
# resnet18 = NthLayer(models.resnet18(pretrained=True), layer=3)
# choosing what layer representation to study
# for l in range(len(models.vgg16().features)):
# print(f'({l}) ', models.vgg16().features[l])
# y = NthLayer(models.vgg16(pretrained=True), layer=l)(imgA)
# print("dim", torch.numel(y), "shape ", y.shape,)
vgg_pool1 = NthLayer(models.vgg16(pretrained=True), layer=4)
po.tools.remove_grad(vgg_pool1)
vgg_pool2 = NthLayer(models.vgg16(pretrained=True), layer=9)
po.tools.remove_grad(vgg_pool2)
vgg_pool3 = NthLayer(models.vgg16(pretrained=True), layer=17)
po.tools.remove_grad(vgg_pool3)
/home/billbrod/miniconda3/envs/plen_3.10/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/billbrod/miniconda3/envs/plen_3.10/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
[22]:
predA = po.to_numpy(models.vgg16(pretrained=True)(imgA))[0]
predB = po.to_numpy(models.vgg16(pretrained=True)(imgB))[0]
plt.plot(predA);
plt.plot(predB);

The following block runs curl
(which should be automatically installed on your system) to download a txt
file containing the ImageNet class labels. If it doesn’t run for some reason, you can download it yourself from here and place it at ../data/imagenet1000_clsidx_to_labels.txt
.
[23]:
!curl https://gist.githubusercontent.com/yrevar/942d3a0ac09ec9e5eb3a/raw/238f720ff059c1f82f368259d1ca4ffa5dd8f9f5/imagenet1000_clsidx_to_labels.txt -o ../data/imagenet1000_clsidx_to_labels.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 30564 100 30564 0 0 81532 0 --:--:-- --:--:-- --:--:-- 81721
[24]:
with open("../data/imagenet1000_clsidx_to_labels.txt") as f:
idx2label = eval(f.read())
for idx in np.argsort(predA)[-5:]:
print(idx2label[idx])
for idx in np.argsort(predB)[-5:]:
print(idx2label[idx])
African elephant, Loxodonta africana
dam, dike, dyke
lakeside, lakeshore
water buffalo, water ox, Asiatic buffalo, Bubalus bubalis
valley, vale
alp
American black bear, black bear, Ursus americanus, Euarctos americanus
water buffalo, water ox, Asiatic buffalo, Bubalus bubalis
valley, vale
lakeside, lakeshore
[25]:
moog = po.synth.Geodesic(imgA, imgB, vgg_pool3)
[26]:
# this should be run for longer on a GPU
moog.synthesize(max_iter=25)
Stop criterion for pixel_change_norm = 1.23674e-01
100%|█████████████████████| 25/25 [01:29<00:00, 3.57s/it, loss=3.9520e+05, gradient norm=2.8781e+04, pixel change norm=3.15347e-01]
[27]:
fig, axes = plt.subplots(2, 1, figsize=(5, 8))
po.synth.geodesic.plot_loss(moog, ax=axes[0]);
po.synth.geodesic.plot_deviation_from_line(moog, ax=axes[1]);

[28]:
plt.plot(moog.calculate_jerkiness().detach())
plt.title('final representation step jerkiness')
[28]:
Text(0.5, 1.0, 'final representation step jerkiness')

[29]:
po.imshow(moog.geodesic, as_rgb=True, zoom=2, title=None, vrange='auto0');
po.imshow(moog.pixelfade, as_rgb=True, zoom=2, title=None, vrange='auto0');
# per channel difference
po.imshow([(moog.geodesic - moog.pixelfade)[1:-1, 0:1]], zoom=2, title=None, vrange='auto1');
po.imshow([(moog.geodesic - moog.pixelfade)[1:-1, 1:2]], zoom=2, title=None, vrange='auto1');
po.imshow([(moog.geodesic - moog.pixelfade)[1:-1, 2:]], zoom=2, title=None, vrange='auto1');
# exaggerated color difference
po.imshow([po.tools.rescale((moog.geodesic - moog.pixelfade)[1:-1])], as_rgb=True, zoom=2, title=None);
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).






Metamers
Metamers are an old concept in the study of perception, dating back to the color-matching experiments in the 18th century that first provided support for the existence of three cone types (though it would be another two hundred years before anatomical evidence was found). These color-matching evidences demonstrated that, by combining three colored lights in different proportions, you could generate a color that humans perceived as identical to any other color, even though their physical spectra were different. Perceptual metamers, then, refer to two images that are physically different but perceived as identical.
For the purposes of plenoptic
, wherever we say “metamers”, we mean “model metamers”: images that are physically different but have identical representation for a given model, i.e., that the model “perceives” as identical. Like all synthesis methods, it is model-specific, and one potential experiment is to determine if model metamers can serve as human percpetual metamers, which provides support for the model as an accurate representation of the human visual system.
In the Lab for Computational Vision, this goes back to Portilla and Simoncelli, 2001, where the authors created a parametric model of textures and synthesized novel images as a way of demonstrating the cases where the model succeeded and failed. In that paper, the model did purport to have anything to do with human vision, and they did not refer to their images as “metamers”, that term did not appear until Freeman and Simoncelli, 2011, where the authors pool the Portilla and Simoncelli texture statistics in windows laid out in a log-polar fashion to generate putative human perceptual metamers.
This notebook demonstrates how to use the Metamer
class to generate model metamers.
[1]:
import plenoptic as po
from plenoptic.tools import to_numpy
import imageio
import torch
import pyrtools as pt
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
import numpy as np
%load_ext autoreload
%autoreload 2
/mnt/home/wbroderick/miniconda3/envs/plenoptic/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Basic usage
As with all our synthesis methods, we start by grabbing a target image and initalizing our model.
[2]:
img = po.data.curie()
po.imshow(img);

For the model, we’ll use a simple On-Off model of visual neurons
[3]:
model = po.simul.OnOff((7, 7))
po.tools.remove_grad(model)
Like all of our models, when this is called on the image, it returns a 3d or 4d tensor (in this case, 4d). This representation is what the Metamer
class will try to match.
[4]:
print(model(img))
tensor([[[[0.6907, 0.6915, 0.6930, ..., 0.6935, 0.6935, 0.6937],
[0.6917, 0.6923, 0.6934, ..., 0.6934, 0.6936, 0.6939],
[0.6935, 0.6937, 0.6939, ..., 0.6933, 0.6939, 0.6942],
...,
[0.6927, 0.6928, 0.6934, ..., 0.6928, 0.6939, 0.6943],
[0.6944, 0.6943, 0.6941, ..., 0.6930, 0.6936, 0.6938],
[0.6950, 0.6948, 0.6942, ..., 0.6929, 0.6932, 0.6933]],
[[0.6951, 0.6944, 0.6933, ..., 0.6928, 0.6928, 0.6926],
[0.6943, 0.6938, 0.6929, ..., 0.6929, 0.6927, 0.6925],
[0.6929, 0.6927, 0.6926, ..., 0.6930, 0.6924, 0.6922],
...,
[0.6935, 0.6934, 0.6930, ..., 0.6934, 0.6926, 0.6923],
[0.6921, 0.6922, 0.6924, ..., 0.6933, 0.6928, 0.6926],
[0.6917, 0.6918, 0.6923, ..., 0.6933, 0.6931, 0.6930]]]])
/mnt/home/wbroderick/miniconda3/envs/plenoptic/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
In order to visualize this, we can use the helper function plot_representation
(see Display
notebook for more details here). In this case, the representation looks like two images, and so we plot it as such:
[5]:
po.tools.display.plot_representation(data=model(img), figsize=(11, 5));

At the simplest, to use Metamer
, simply initialize it with the target image and the model, then call .synthesize()
. By setting store_progress=True
, we update a variety of attributes (all of which start with saved_
) on each iteration so we can later examine, for example, the synthesized image over time.
[6]:
met = po.synth.Metamer(img, model)
met.synthesize(store_progress=True, max_iter=50)
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
warnings.warn(
100%|██████████| 50/50 [00:00<00:00, 50.57it/s, loss=9.4442e-06, learning_rate=0.01, gradient_norm=7.7735e-04, pixel_change_norm=3.8657e-01]
We then call the plot_synthesis_status
function to see how things are doing. The image on the left shows the metamer at this moment, while the center plot shows the loss over time, with the red dot pointing out the current loss, and the rightmost plot shows the representation error. If a model has a plot_representation
method, this plot can be more informative, but this plot can always be created.
[7]:
# model response error plot has two subplots, so we increase its relative width
po.synth.metamer.plot_synthesis_status(met, width_ratios={'plot_representation_error': 2});
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/display.py:950: UserWarning: ax is not None, so we're ignoring figsize...
warnings.warn("ax is not None, so we're ignoring figsize...")

plot_synthesis_status()
is a helper function to show all of this at once, but the individual components can be created separately:
[8]:
fig, axes = plt.subplots(1, 3, figsize=(25, 5), gridspec_kw={'width_ratios': [1, 1, 2]})
po.synth.metamer.display_metamer(met, ax=axes[0])
po.synth.metamer.plot_loss(met, ax=axes[1])
po.synth.metamer.plot_representation_error(met, ax=axes[2]);

The loss is decreasing, but clearly there’s much more to go. So let’s continue.
You can resume synthesis as long as you pass the same argument to store_progress
on each run (several other arguments, such as optimizer
and scheduler
, must be None on any run except the first).
Everything that stores the progress of the optimization (loss
, saved_model_response
, saved_signal
) will persist between calls and so potentially get very large.
[9]:
met.synthesize(store_progress=True, max_iter=100)
12%|█▏ | 12/100 [00:00<00:01, 52.64it/s, loss=7.5078e-06, learning_rate=0.01, gradient_norm=8.2988e-04, pixel_change_norm=2.7297e-01]/mnt/home/wbroderick/plenoptic/src/plenoptic/synthesize/metamer.py:195: UserWarning: Loss has converged, stopping synthesis
warnings.warn("Loss has converged, stopping synthesis")
14%|█▍ | 14/100 [00:00<00:01, 49.28it/s, loss=7.5078e-06, learning_rate=0.01, gradient_norm=8.2988e-04, pixel_change_norm=2.7297e-01]
Let’s examine the status again. But instead of looking at the most recent status, let’s look at 10 from the end:
[10]:
po.synth.metamer.plot_synthesis_status(met, iteration=-10, width_ratios={'plot_representation_error': 2});

Since we have the ability to select which iteration to plot (as long as we’ve been storing the information), we can create an animation showing the synthesis over time. The matplotlib.animation
object that gets returned can’t be viewed directly, it either has to be converted to html for display in the notebook (using the convert_anim_to_html
function we provide) or saved as some video format (e.g., anim.save('test.mp4'
), which requires ffmpeg
to be installed and on your path.
[11]:
anim = po.synth.metamer.animate(met, width_ratios={'plot_representation_error': 2})
po.tools.convert_anim_to_html(anim)
/mnt/home/wbroderick/plenoptic/src/plenoptic/synthesize/metamer.py:1645: UserWarning: Looks like representation is image-like, haven't fully thought out how to best handle rescaling color ranges yet!
warnings.warn("Looks like representation is image-like, haven't fully thought out how"
[11]:
Generally speaking, synthesis will run until you hit max_iter
iterations. However, synthesis can also stop if it looks like the loss has stopped changing. This behavior is controlled with the loss_thresh
and loss_change_iter
arguments: if the loss has changed by less than loss_thresh
over the past loss_change_iter
iterations, we stop synthesis.
Moving between devices
Metamer
has a .to()
method for moving the object between devices or dtypes. Call it as you would call any tensor.to
and it will move over the necessary attributes.
Saving and loading
Finally, you probably want to save the results of your synthesis. As mentioned above, you can save the synthesis animation, and all of the plots return regular matplotlib
Figures and can be manipulated as expected. The synthesized image itself is a tensor and can be detached, converted to a numpy array, and saved (either as an image or array) as you’d expect. po.to_numpy
is a convenience function we provide for stuff like this, which detaches the tensor, sends it to the CPU, and converts
it to a numpy array with dtype float32. Note that it doesn’t squeeze the tensor, so you may want to do that yourself.
[12]:
met_image = po.to_numpy(met.metamer).squeeze()
# convert from array to int8 for saving as an image
print(f'Metamer range: ({met_image.min()}, {met_image.max()})')
met_image = po.tools.convert_float_to_int(np.clip(met_image, 0, 1))
imageio.imwrite('test.png', met_image)
Metamer range: (-0.00023865862749516964, 1.0005061626434326)
The metamer lies slightly outside the range [0, 1]
, so we clip before saving as an image. Metamer’s objective function has a quadratic penalty on the synthesized image’s range, and the weight on this penalty can be adjusted by changing the value of range_penalty_lambda
at initialization.
You can also save the entire Metamer
object. This can be fairly large (depending on how many iterations you ran it for and how frequently you stored progress), but stores all information:
[13]:
met.save('test.pt')
You can then load it back in using the method .load()
. Note that you need to first instantiate the Metamer
object and then call .load()
– it must be instantiated with the same image, model, and loss function in order to load it in!
[14]:
met_copy = po.synth.Metamer(img, model)
# it's modified in place, so this method doesn't return anything
met_copy.load('test.pt')
(met_copy.saved_metamer == met.saved_metamer).all()
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
warnings.warn(
[14]:
tensor(True)
Because the model itself can be quite large, we do not save it along with the Metamer
object. This is why you must initialize it before loading from disk.
Reproducibility
You can set the seed before you call synthesize()
for reproducibility by using po.tools.set_seed()
. This will set both the pytorch
and numpy
seeds, but note that we can’t guarantee complete reproducibility: see pytroch docs for some caveats (we currently do not do the stuff described under CuDNN), as well as this issue about resuming state after
saving.
Also note that pytorch does not guarantee identical results between CPU and GPU, even with the same seed.
More Advanced Options
The solution found by the end of the Basic usage section is only one possible metamer. UNTRUE: For this model, it was relatively easy to find a decent metamer (as seen by the relatively low loss and representation error), but that’s not always the case. Optimization in a high-dimensional space with non-linear models is inherently challenging and so we can’t guarantee you’ll find a model metamer, but we do provide some tools / extra functionality to help.
Initialization
By default, the initial_image
arg when initializing Metamer
is None
, in which case we initialize with uniformly-distributed random noise between 0 and 1. If you wish to use some other image for initialization, you can initialize it yourself (it must be the same shape as target_signal
) and pass it as the initial_image
arg.
Optimization basics
You can set all the various optimization parameters you’d expect. synthesize()
has an optimizer
arg, which accepts a pytorch optimizer. You can therefore initialize your own optimizer after initializing Metamer
like so
[15]:
met = po.synth.Metamer(img, model)
opt = torch.optim.Adam([met.metamer], lr=.001, amsgrad=True)
met.synthesize(optimizer=opt)
48%|████▊ | 48/100 [00:00<00:00, 55.27it/s, loss=8.6771e-05, learning_rate=0.001, gradient_norm=2.4843e-04, pixel_change_norm=1.3648e-01]/mnt/home/wbroderick/plenoptic/src/plenoptic/synthesize/metamer.py:195: UserWarning: Loss has converged, stopping synthesis
warnings.warn("Loss has converged, stopping synthesis")
50%|█████ | 50/100 [00:00<00:00, 54.20it/s, loss=8.6771e-05, learning_rate=0.001, gradient_norm=2.4843e-04, pixel_change_norm=1.3648e-01]
synthesize()
also accepts a scheduler
argument, so that you can pass a pytorch scheduler, which modifies the learning rate during optimization.
Coarse-to-fine optimization
Some models, such as the Portilla-Simoncelli texture statistics, have a multiscale representation of the image, which can complicate the optimization. It’s generally recommended that you normalize the representation (or use a specific loss function) so that the different scales all contribute equally to the representation, but that’s out of the scope of this notebook.
We provide the option to use coarse-to-fine optimization, such that you optimize the different scales separately (starting with the coarsest and then moving progressively finer) and then, at the end, optimizing all of them simultaneously. This was first used in Portilla and Simoncelli, 2000, and can help avoid local optima in image space. Unlike everything else described in this notebook, it will not work for all models. There are two specifications the model must meet:
It must have a
scales
attribute that gives the scales in the order they should be optimized.Its
forward()
method must accept ascales
keyword argument, which accpets a list and causes the model to return only the scale(s) included. SeePortillaSimoncelli.forward()
for an example.
We can see that po.simul.PortillaSimoncelli
satisfies these constraints, and that the model returns a subset of its output when the scales
argument is passed to forward()
[16]:
# we change images to a texture, which the PS model can do a good job capturing
img = po.data.reptile_skin()
ps = po.simul.PortillaSimoncelli(img.shape[-2:])
print(ps.scales)
print(ps.forward(img).shape)
print(ps.forward(img, scales=[0]).shape)
['pixel_statistics', 'residual_lowpass', 3, 2, 1, 0, 'residual_highpass']
torch.Size([1, 1, 1046])
torch.Size([1, 1, 261])
There are two choices for how to handle coarse-to-fine optimization: 'together'
or 'separate'
. In 'together'
(recommended), we start with the coarsest scale and then gradually add each finer scale (thi sis like blurring the objective function and then gradually adding details). In 'separate'
, we compute the gradient with respect to each scale separately (ignoring the others), then with respect to all of them at the end.
If our model meets the above requirements, then we can use the MetamerCTF
class, which uses this coarse-to-fine procedure. We specify which of the two above options are used during initialization, and it will work through the scales as described above (and will resume correctly if you resume synthesis). Note that this will take a while, as it has to go through each scale. Also note that the progress bar now specifies which scale we’re on.
[17]:
met = po.synth.MetamerCTF(img, ps, loss_function=po.tools.optim.l2_norm, coarse_to_fine='together')
met.synthesize(store_progress=True, max_iter=100)
# we don't show our synthesized image here, because it hasn't gone through all the scales, and so hasn't finished synthesizing
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:211: UserWarning: Validating whether model can work with coarse-to-fine synthesis -- this can take a while!
warnings.warn("Validating whether model can work with coarse-to-fine synthesis -- this can take a while!")
100%|██████████| 100/100 [00:13<00:00, 7.42it/s, loss=7.7466e+00, learning_rate=0.01, gradient_norm=4.9135e-01, pixel_change_norm=2.1053e+00, current_scale=residual_lowpass, current_scale_loss=1.3752e+00]
In order to control when synthesis considers a scale to be “done” and move on to the next one, you can set two arguments: change_scale_criterion
and ctf_iters_to_check
. If the scale-specific loss (current_scale_loss
in the progress bar above) has changed by less than change_scale_criterion
over the past ctf_iters_to_check
iterations, we consider that scale to have reached a local optimum and move on to the next. You can also set change_scale_criterion=None
, in which case
we always shift scales after ctf_iters_to_check
iterations
[18]:
# initialize with some noise that is approximately mean-matched and with low variance
im_init = torch.rand_like(img) * .1 + img.mean()
met = po.synth.MetamerCTF(img, ps, loss_function=po.tools.optim.l2_norm, initial_image=im_init, coarse_to_fine='together', )
met.synthesize(store_progress=10, max_iter=500,
change_scale_criterion=None, ctf_iters_to_check=7)
po.imshow([met.image, met.metamer], title=['Target image', 'Synthesized metamer'], vrange='auto1');
100%|██████████| 500/500 [00:47<00:00, 10.60it/s, loss=1.5794e-01, learning_rate=0.01, gradient_norm=1.2401e+00, pixel_change_norm=1.7646e-01, current_scale=all, current_scale_loss=1.5794e-01]

And we can see these shfits happening in the animation of synthesis:
[19]:
po.tools.convert_anim_to_html(po.synth.metamer.animate(met))
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/display.py:950: UserWarning: ax is not None, so we're ignoring figsize...
warnings.warn("ax is not None, so we're ignoring figsize...")
[19]:
MetamerCTF
has several attributes which are used in the course of coarse-to-fine synthesis:
scales_loss
: this list contains the scale-specific loss at each iteration (that is, the loss computed on just the scale(s) we’re optimizing on that iteration; which we use to determine when to switch scales).scales
: this is a list of the scales in optimization order (i.e., from coarse to fine). The last entry will be'all'
(since after we’ve optimized each individual scale, we move on to optimizing all at once). This attribute will be modified by the synthesize() method and is used to track which scale we’re currently optimizing (the first one). When we’ve gone through all the scales present, this will just contain a single value:'all'
.scales_timing
: this is a dictionary whose keys are the values of scales. The values are lists, with 0 through 2 entries: the first entry is the iteration where we started optimizing this scale, the second is when we stopped (thus if it’s an empty list, we haven’t started optimzing it yet).scales_finished
: this is a list of the scales that we’ve finished optimizing (in the order we’ve finished). The union of this andscales
will be the same asmetamer.model.scales
.
A small wrinkle: if coarse_to_fine=='together'
, then none of these will ever contain the final, finest scale, since that is equivalent to 'all'
.
MAD Competition Conceptual Introduction
This notebook shows the simplest possible MAD: a two pixel image, where our models are L2-norm and L1-norm. It will not explain the basics of MAD Competition or how to use it. Instead, since we’re dealing with a simple and low-dimensional example, we can plot the image in pixel space and draw out the model contours, which we can use to explicitly check whether we’ve found the correct results.
[1]:
import plenoptic as po
from plenoptic.tools import to_numpy
import torch
import pyrtools as pt
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
import numpy as np
import itertools
%load_ext autoreload
%autoreload 2
/home/billbrod/miniconda3/envs/plenoptic/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
First we pick our metrics and run our synthesis. We create four different MADCompetition
instances, in order to create the full set of images.
[2]:
img = torch.tensor([.5, .5], dtype=torch.float32).reshape((1, 1, 1, 2))
def l1_norm(x, y):
return torch.norm(x-y, 1)
metrics = [po.tools.optim.l2_norm, l1_norm]
all_mad = {}
# this gets us all four possibilities
for t, (m1, m2) in itertools.product(['min', 'max'], zip(metrics, metrics[::-1])):
name = f'{m1.__name__}_{t}'
# we set the seed like this to ensure that all four MADCompetition instances have the same initial_signal. Try different seed values!
po.tools.set_seed(10)
all_mad[name] = po.synth.MADCompetition(img, m1, m2, t, metric_tradeoff_lambda=1e4)
optim = torch.optim.Adam([all_mad[name].mad_image], lr=.0001)
print(f"Synthesizing {name}")
all_mad[name].synthesize(store_progress=True, max_iter=2000, optimizer=optim, stop_criterion=1e-10)
# double-check that these are all equal.
assert all([torch.allclose(all_mad['l2_norm_min'].initial_image, v.initial_image) for v in all_mad.values()])
Synthesizing l2_norm_min
95%|███████████████████████████████████████▉ | 1903/2000 [00:05<00:00, 259.90it/s, loss=1.0817e-01, learning_rate=0.0001, gradient_norm=7.4119e-04, pixel_change_norm=3.6378e-07, reference_metric=1.5296e-01, optimized_metric=1.0816e-01]/home/billbrod/Documents/plenoptic/plenoptic/synthesize/mad_competition.py:445: UserWarning: Loss has converged, stopping synthesis
warnings.warn("Loss has converged, stopping synthesis")
95%|███████████████████████████████████████▉ | 1904/2000 [00:05<00:00, 339.19it/s, loss=1.0817e-01, learning_rate=0.0001, gradient_norm=7.4119e-04, pixel_change_norm=3.6378e-07, reference_metric=1.5296e-01, optimized_metric=1.0816e-01]
Synthesizing l1_norm_min
100%|██████████████████████████████████████████| 2000/2000 [00:06<00:00, 312.77it/s, loss=1.2641e-01, learning_rate=0.0001, gradient_norm=1.0004e+00, pixel_change_norm=1.5457e-05, reference_metric=1.2638e-01, optimized_metric=1.2639e-01]
Synthesizing l2_norm_max
64%|██████████████████████████▎ | 1282/2000 [00:04<00:02, 289.04it/s, loss=-1.5302e-01, learning_rate=0.0001, gradient_norm=9.9836e-01, pixel_change_norm=5.5730e-06, reference_metric=1.5305e-01, optimized_metric=1.5304e-01]
Synthesizing l1_norm_max
79%|████████████████████████████████▌ | 1587/2000 [00:03<00:00, 413.92it/s, loss=-1.7886e-01, learning_rate=0.0001, gradient_norm=1.1160e-03, pixel_change_norm=3.8166e-07, reference_metric=1.2651e-01, optimized_metric=1.7891e-01]
(The red progress bars show that we hit our stop criterion and broke out of the loop early, not that anything went wrong.)
Now let’s visualize our metrics for these four instances:
[3]:
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
pal = {'l1_norm': 'C0', 'l2_norm': 'C1'}
for ax, (k, mad) in zip(axes.flatten(), all_mad.items()):
ax.plot(mad.optimized_metric_loss, pal[mad.optimized_metric.__name__], label=mad.optimized_metric.__name__)
ax.plot(mad.reference_metric_loss, pal[mad.reference_metric.__name__], label=mad.reference_metric.__name__)
ax.set(title=k.capitalize().replace('_', ' '), xlabel='Iteration', ylabel='Loss')
ax.legend(loc='center left', bbox_to_anchor=(1.1, 1.1))
[3]:
<matplotlib.legend.Legend at 0x7f42cfacbb20>

This looks pretty good – the L1 norm line is flat in the left column, while the L2 norm is flat in the right column, and the other line is either rising (in the bottom row) or falling (in the top).
Since our images only have two pixels, we can get a better sense of what’s going on by plotting them in pixel space: first pixel value on the x-axis, second on the y-axis. We can use this to visualize the points and how far they are from each other. We also know what the level curves look like for the \(L_1\) and \(L_2\) norms (a diamond and a circle centered on our reference image, respectively), so we can add them as well.
[4]:
l1 = to_numpy(torch.norm(all_mad['l2_norm_max'].image - all_mad['l2_norm_max'].initial_image, 1))
l2 = to_numpy(torch.norm(all_mad['l2_norm_max'].image - all_mad['l2_norm_max'].initial_image, 2))
ref = to_numpy(all_mad['l2_norm_max'].image.squeeze())
init = to_numpy(all_mad['l2_norm_max'].initial_image.squeeze())
def circle(origin, r, n=1000):
theta = 2*np.pi/n*np.arange(0, n+1)
return np.array([origin[1]+r*np.cos(theta), origin[0]+r*np.sin(theta)])
def diamond(origin, r, n=1000):
theta = 2*np.pi/n*np.arange(0, n+1)
rotation = np.pi/4
square_correction = (np.abs(np.cos(theta-rotation)-np.sin(theta-rotation)) + np.abs(np.cos(theta-rotation)+np.sin(theta-rotation)))
square_correction /= square_correction[0]
r = r / square_correction
return np.array([origin[1]+r*np.cos(theta), origin[0]+r*np.sin(theta)])
l2_level_set = circle(ref, l2,)
l1_level_set = diamond(ref, l1)
We can see in the following plot that it is doing the right thing, but it’s very hard to separate these two metrics. We’ve styled the points below so that their color matches the contour level they’re supposed to lie on (i.e., the fixed metric), and so that a hollow point shows the target was to minimize, and a solid one to maximize.
[5]:
fig, ax = plt.subplots(1, 1, figsize=(5, 5))
ax.scatter(*ref, label='reference', c='r', s=100)
ax.scatter(*init, label='initial', c='k', s=100)
ax.plot(*l1_level_set, pal['l1_norm']+'--', label='L1 norm level set')
ax.plot(*l2_level_set, pal['l2_norm']+'--', label='L2 norm level set')
for k, v in all_mad.items():
ec = pal[v.reference_metric.__name__]
fc = 'none' if 'min' in k else ec
ax.scatter(*v.mad_image.squeeze().detach(), fc=fc, ec=ec, label=k)
plt.legend(bbox_to_anchor=(1.04,1), loc="upper left")
[5]:
<matplotlib.legend.Legend at 0x7f42cebfe640>

In the above plot, the red dot in the middle is our reference signal and the black dot shows the initial signal. The new points we synthesized will either have the same L1 or L2 norm distance with the reference signal as that initial signal (the other distance will be minimized or maximized). Let’s look at the solid blue dot first: this point has had its L2-norm maximized, while holding the L1 norm constant. We can see, therefore, that it lies along the L1-norm level set (the diamond) while moving as far away from the red dot as possible, which puts it in the corner of the diamond. This means that one of the pixels has the same value as the reference, while the other is as different as possible. Note that the other corners of the diamond would work equally well, but our initial point put us closest to this one.
Conversely, the solid orange dot is maximizing L1 norm (and holding L2 norm constant), so it lies along the L2 norm level set while moving as far away from the red dot as possible, which puts it along a diagonal away from the red dot. This means that neither pixel has the same value as the reference, they’re both an intermediate value that has the same absolute difference from the reference point, as we can verify below:
[6]:
all_mad['l1_norm_max'].mad_image - all_mad['l1_norm_max'].image
[6]:
tensor([[[[-0.0894, -0.0895]]]], grad_fn=<SubBackward0>)
Now, if we look at the hollow orange dot, which is minimizing L1 and holding L2 constant, we can see that it has similarly moved along the L1 level set but gotten as close to the reference as possible, which puts it “along the axis” with the solid blue dot, just closer. Therefore, it has one pixel whose value matches that of the reference, and the other that is as close to the reference value as possible. Analogous logic holds for the hollow blue dot.
Generally, you’re working with metrics and signals where you can’t make the above plot to double-check the performance of MAD. Unfortunately, you’ll have to spend time playing with the various parameters in order to find what works best. The most important of these parameters is metric_tradeoff_lambda
; you can see above that we set it to the very high value of 1e4
(if you try reducing this yourself, you’ll see the fixed metric doesn’t stay constant and the points in the bottom plot move
towards the reference point in the center). In this case, all four values took the same value of metric_tradeoff_lambda
, but in general that might not be true (for example, if one of your metrics returns much larger values than the other).
Full images!
We can, however, extend this L1 and L2 example to full images. Let’s give that a try on a checkerboard:
[7]:
def create_checkerboard(image_size, period, values=[0, 1]):
image = pt.synthetic_images.square_wave(image_size, period=period)
image += pt.synthetic_images.square_wave(image_size, period=period, direction=np.pi/2)
image += np.abs(image.min())
image /= image.max()
return torch.from_numpy(np.where((image < .75) & (image > .25), *values[::-1])).unsqueeze(0).unsqueeze(0).to(torch.float32)
# by setting the image to lie between 0 and 255 and be slightly within the max possible range, we make the optimizatio a bit easier.
img = 255 * create_checkerboard((64, 64), 16, [.1, .9])
po.imshow(img, vrange=(0, 255), zoom=4);
# you could also do this with another natural image, give it a try!

Now we’ll do the same process of running synthesis and checking our loss as above:
[8]:
def l1_norm(x, y):
return torch.norm(x-y, 1)
metrics = [po.tools.optim.l2_norm, l1_norm]
tradeoffs = {'l2_norm_max': 1e-4, 'l2_norm_min': 1e-4,
'l1_norm_max': 1e2, 'l1_norm_min': 1e3}
all_mad = {}
# this gets us all four possibilities
for t, (m1, m2) in itertools.product(['min', 'max'], zip(metrics, metrics[::-1])):
name = f'{m1.__name__}_{t}'
# we set the seed like this to ensure that all four MADCompetition instances have the same initial_signal. Try different seed values!
po.tools.set_seed(0)
all_mad[name] = po.synth.MADCompetition(img, m1, m2, t, metric_tradeoff_lambda=tradeoffs[name], initial_noise=20, allowed_range=(0, 255), range_penalty_lambda=1)
optim = torch.optim.Adam([all_mad[name].mad_image], lr=.1)
print(f"Synthesizing {name}")
all_mad[name].synthesize(store_progress=True, max_iter=30000, optimizer=optim, stop_criterion=1e-10)
# double-check that these are all equal.
assert all([torch.allclose(all_mad['l2_norm_min'].initial_image, v.initial_image) for v in all_mad.values()])
Synthesizing l2_norm_min
3%|█▌ | 1049/30000 [00:03<01:25, 339.55it/s, loss=9.6436e+02, learning_rate=0.1, gradient_norm=1.6492e-01, pixel_change_norm=3.3186e-01, reference_metric=6.1687e+04, optimized_metric=9.6386e+02]
Synthesizing l1_norm_min
100%|███████████████████████████████████████████| 30000/30000 [03:17<00:00, 152.15it/s, loss=6.5361e+03, learning_rate=0.1, gradient_norm=6.3860e+01, pixel_change_norm=6.7973e-01, reference_metric=1.1773e+03, optimized_metric=6.5345e+03]
Synthesizing l2_norm_max
15%|▏| 4526/30000 [00:14<01:23, 306.11it/s, loss=-3.7644e+03, learning_rate=0.1, gradient_norm=4.7183e-01, pixel_ch
Synthesizing l1_norm_max
8%| | 2536/30000 [00:08<01:28, 308.63it/s, loss=-7.5356e+04, learning_rate=0.1, gradient_norm=5.3381e-01, pixel_ch
We’re going to visualize these slightly different to the above, since they have such different scales. The left axis shows the L1 norm loss, while the right one shows the L2 norm loss. Each of the four lines is a separate synthesis target, with the colors the same as above (note that l1_norm_min
looks like it hasn’t quite converged yet – you can decrease the stop_criterion
value and increase max_iter
above to let it run longer, but the above is sufficient for demonstrative
purposes).
[9]:
po.synth.mad_competition.plot_loss_all(*all_mad.values());

Now we’ll show all the synthesized MAD images. In the following, the top row shows the reference and initial images, then the MAD images:
[10]:
po.synth.mad_competition.display_mad_image_all(*all_mad.values(), zoom=4, vrange=(0, 255));

If we go through them following the same logic as on the two-pixel case, we can see that our conclusions still hold. The following plots the difference between each of the above images and the reference image, to make the following points explicit:
Max L2 and min L1 mainly have pixels that have the same value as the reference image, and the rest are all extremal values, as different from the reference as possible. Max L2 has more of these pixels, and they have more extremal values.
Max L1 and min L2 pixels are all intermediate values, all the same absolute difference from the reference image. Max L1’s absolute difference is larger than min L2’s.
[11]:
keys = ['l2_norm_min', 'l2_norm_max', 'l1_norm_min', 'l1_norm_max']
po.imshow([all_mad[k].mad_image - all_mad[k].image for k in keys], title=keys,
zoom=4, vrange='indep0', col_wrap=2);

Finally, to connect this to perception, this implies that L2 is a better perceptual metric than L1, as L2’s two images are more perceptually distinct than L1’s and the salt and pepper noise found in L2 max results in a perceptually worse image than the mid-level gray values found in L2 min (L1 makes the opposite prediction, that the mid-level gray values are perceptually worse than the salt and pepper noise). In order to validate this, you’d want to run a psychophysics experiment, but hopefully this simple example has helped show how MAD Competition can be used!
MAD Competition Usage
Maximum differentiation (MAD) competition comes from a paper published in 2008 by Zhou Wang and Eero Simoncelli (reprint from LCV website). In MAD Competition, the goal is to efficiently compare two competing perceptual metrics. Like the inputs for all synthesis methods in plenoptic
, metrics operate on images and produce predictions related to perception. As originally conceived, the metrics in MAD competition are either similarity
(e.g., SSIM) or distance (e.g., MSE) metrics: they take two images and return a scalar value that gives a perceptual similarity or distance. For distance metrics, the smaller this number is, the more perceptually similar the metric predicts they will be; for similarity metrics, the larger the number is, the more percpetually similar.
In plenoptic
, a single instantiation of MADCompetition
synthesizes a single image, holding the fixed_metric
constant while either maximizing or minimizing synthesis_metric
, depending on the value of synthesis_target
. A full set of MAD competition images consists of four images, maximizing and minimizing each of the two metrics. For each pair of images, one metric predicts they are perceptually identical, while the other metric predicts they are as dissimilar as possible. This
set therefore allows us to efficiently compare the two models.
In the paper, these images are generated by manually computing the gradients, projecting one gradient out of the other, and repeating until convergence was reached. This doesn’t work as well in the general case, so we instead optimize using the following objective function:
where \(t\) is 1 if mad.synthesis_target
is 'min'
and -1 if it’s 'max'
, \(L_1\) is mad.synthesis_metric
, \(L_2\) is mad.fixed_metric
, \(x\) is mad.reference_signal
, \(\hat{x}\) is mad.synthesized_signal
, \(\epsilon\) is the initial noise, \(\mathcal{B}\) is the quadratic bound penalty, \(\lambda_1\) is mad.metric_tradeoff_lambda
and \(\lambda_2\) is mad.range_penalty_lambda
.
That’s the general idea, now let’s explore how to use the MADCompetition
class for generating these images
[1]:
import plenoptic as po
import imageio
import torch
import pyrtools as pt
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
import numpy as np
import warnings
%load_ext autoreload
%autoreload 2
/home/billbrod/miniconda3/envs/plenoptic/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Basic usage
As with all our synthesis methods, we start by grabbing a target image and initalizing our models.
[2]:
img = po.data.curie()
po.imshow(img)
/home/billbrod/Documents/plenoptic/plenoptic/tools/data.py:126: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
images = torch.tensor(images, dtype=torch.float32)

To start, we’ll demonstrate MAD competition as described in the paper, using two metrics: SSIM (structural similarity index, described here) and MSE (mean-squared error), implementations for both of which are found in plenoptic.metric
. We use the weighted version of SSIM described in the MAD Competition paper, hence the keyword argumentpassed to ssim
below. Note also that we use 1-SSIM
: SSIM measures similarity (so that 0
means completely different and 1 means identical), but MADCompetition
expects metrics, which return 0 if and only if the two inputs are identical.
[3]:
model1 = lambda *args: 1-po.metric.ssim(*args, weighted=True, pad='reflect')
model2 = po.metric.mse
To intitialize the method, we only need to specify the target image, the two metrics, and the target. To start, we will hold MSE constant, while minimizing SSIM.
Note that, as described in the first block, we synthesize these images by optimizing a tradeoff between the loss of these two metrics, weighted by the metric_tradeoff_lambda
. If that argument is unset, we default to something we think is reasonable, but in practice, we often need to experiment and find the appropriate value, trying out different values until the fixed metric stays constant while the synthesis metric decreases or increases as desired.
[4]:
mad = po.synth.MADCompetition(img, optimized_metric=model1, reference_metric=model2, minmax='min', initial_noise=.04,
metric_tradeoff_lambda=10000)
/home/billbrod/miniconda3/envs/plenoptic/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
At the most basic, all we need to do is call mad.synthesize()
. Let’s do that and then view the outcome. There are several additional arguments to synthesize()
but none are required.
[5]:
with warnings.catch_warnings():
# we suppress the warning telling us that our image falls outside of the (0, 1) range,
# which will happen briefly during synthesis.
warnings.simplefilter('ignore')
mad.synthesize(max_iter=200)
fig = po.synth.mad_competition.plot_synthesis_status(mad)
95%|████████████████████████████████████████████▋ | 190/200 [00:19<00:01, 9.61it/s, loss=1.5136e-03, learning_rate=0.01, gradient_norm=7.4030e-05, pixel_change_norm=2.9363e-02, reference_metric=1.4757e-03, optimized_metric=1.4892e-03]

We can see from the loss plot that SSIM’s loss has decreased, while MSE’s, other than a brief dip in the beginning, is staying roughly constant.
As described in the opening paragraph, a full set of MAD competition synthesized images consists of four images. In order to create the other images, we must create a new instance of MADCompetition
. Let’s do that for the other images now:
[6]:
mad_ssim_max = po.synth.MADCompetition(img, optimized_metric=model1, reference_metric=model2, minmax='max', initial_noise=.04,
metric_tradeoff_lambda=1e6)
with warnings.catch_warnings():
# we suppress the warning telling us that our image falls outside of the (0, 1) range,
# which will happen briefly during synthesis.
warnings.simplefilter('ignore')
mad_ssim_max.synthesize(max_iter=200)
fig = po.synth.mad_competition.plot_synthesis_status(mad_ssim_max)
100%|██████████████████████████████████████████████| 200/200 [00:20<00:00, 9.83it/s, loss=-3.4307e-01, learning_rate=0.01, gradient_norm=1.9942e-03, pixel_change_norm=6.0971e-02, reference_metric=1.6174e-03, optimized_metric=3.5030e-01]

We’re making progress, but it doesn’t look like SSIM
has quite saturated. Let’s see if we can make more progress!
To continue synthesis, we can simply call mad.synthesize()
again (optimizer
and scheduler
will both need to be None
, the default, so we reuse the ones from the initial call), and we then pick up right where we left off.
[7]:
with warnings.catch_warnings():
# we suppress the warning telling us that our image falls outside of the (0, 1) range,
# which will happen briefly during synthesis.
warnings.simplefilter('ignore')
mad_ssim_max.synthesize(max_iter=300)
fig = po.synth.mad_competition.plot_synthesis_status(mad_ssim_max)
100%|██████████████████████████████████████████████| 300/300 [00:37<00:00, 8.07it/s, loss=-3.4890e-01, learning_rate=0.01, gradient_norm=3.4509e-04, pixel_change_norm=1.5451e-02, reference_metric=1.6184e-03, optimized_metric=3.5621e-01]

Next, let’s hold SSIM constant while changing MSE. This will require changing the metric_tradeoff_lambda
. We also set stop_criterion
explicitly, to a smaller value, to allow the synthesis to continue longer.
[8]:
mad_mse_min = po.synth.MADCompetition(img, optimized_metric=model2, reference_metric=model1, minmax='min', initial_noise=.04,
metric_tradeoff_lambda=1)
with warnings.catch_warnings():
# we suppress the warning telling us that our image falls outside of the (0, 1) range,
# which will happen briefly during synthesis.
warnings.simplefilter('ignore')
mad_mse_min.synthesize(max_iter=400, stop_criterion=1e-6)
fig = po.synth.mad_competition.plot_synthesis_status(mad_mse_min)
100%|███████████████████████████████████████████████| 400/400 [01:03<00:00, 6.31it/s, loss=9.9894e-04, learning_rate=0.01, gradient_norm=7.8280e-05, pixel_change_norm=4.3131e-02, reference_metric=2.3662e-01, optimized_metric=9.9130e-04]

Maximizing MSE has the same issue; after playing around with it, we use a slightly larger metric_tradeoff_lambda
than above.
In general, finding an appropriate hyperparameter here will require some consideration on the part of the user and some testing of different values.
[9]:
mad_mse_max = po.synth.MADCompetition(img, optimized_metric=model2, reference_metric=model1, minmax='max', initial_noise=.04,
metric_tradeoff_lambda=10)
with warnings.catch_warnings():
# we suppress the warning telling us that our image falls outside of the (0, 1) range,
# which will happen briefly during synthesis.
warnings.simplefilter('ignore')
mad_mse_max.synthesize(max_iter=200, stop_criterion=1e-6)
fig = po.synth.mad_competition.plot_synthesis_status(mad_mse_max)
100%|██████████████████████████████████████████████| 200/200 [00:41<00:00, 4.86it/s, loss=-5.7400e-03, learning_rate=0.01, gradient_norm=4.9788e-03, pixel_change_norm=2.5293e-01, reference_metric=2.4108e-01, optimized_metric=5.8745e-03]

The image above has increased the local contrast in different parts of the image, which SSIM generally doesn’t care about but MSE does. For example, the collar, which in the original image is two different shades of gray, here is black and white. Similarly with the eyes, hair, and lips.
While above we displayed the synthesized image and the loss together, these are actually handled by two helper functions and can be called separately, as axes-level figures. They have additional arguments that may be worth playing around with:
[18]:
fig, axes = plt.subplots(1, 2, figsize=(15, 5), gridspec_kw={'width_ratios': [1, 2]})
po.synth.mad_competition.display_mad_image(mad, ax=axes[0], zoom=.5)
po.synth.mad_competition.plot_loss(mad, axes=axes[1], iteration=-100)
[18]:
<AxesSubplot: xlabel='Synthesis iteration', ylabel='Optimized metric loss'>

We also provide helper functions to plot a full set of MAD images together, either displaying all their synthesized images or their losses (note that we’re calling our metric SDSIM
because it’s now the structural dis-similarity):
[11]:
po.synth.mad_competition.display_mad_image_all(mad, mad_mse_min, mad_ssim_max, mad_mse_max, 'SDSIM');

The top row shows the reference and initial images, our picture of Marie Curie and that same image plus some normally-distributed noise. The next row of images has the same MSE as the right image in the top row (when compared against the reference image), but different SDSIM values. The left image has the lowest SDSIM and is thus considered the best image, while the right image has the highest SDSIM and is thus considered the worst. The next row of images has the same SDSIM as the right image in the top, but different MSE values. The left has the lowest MSE and is thus considered the best, while the right has highest MSE and is thus considered the worst.
So MSE considers the first three images to be approximately equivalent in quality, while SDSIM considers the first image and the last two to be equivalent.
From the following plot, we can see that we generally manage to hold the fixed metric constant (dashed line for SDSIM in the right plot, solid line for MSE in the left) while increasing the target metric.
[12]:
po.synth.mad_competition.plot_loss_all(mad, mad_mse_min, mad_ssim_max, mad_mse_max, 'SDSIM');

Steerable Pyramid
This tutorial walks through the basic features of the torch implementation of the Steerable Pyramid included in plenoptic
, and as such describes some basic signal processing that may be useful when building models that process images. We use the steerable pyramid construction in the frequency domain, which provides perfect reconstruction (as long as the input has an even height and width, i.e., 256x256
not 255x255
) and allows for any number of orientation bands. For more details on
steerable pyramids and how they are built, see the pyrtools tutorial at: https://pyrtools.readthedocs.io/en/latest/.
Here we will specifically focus on the specifics of the torch version and how it may be used in concert with other differentiable torch models.
[1]:
import numpy as np
import torch
# this notebook uses torchvision, which is an optional dependency.
# if this fails, install torchvision in your plenoptic environment
# and restart the notebook kernel.
try:
import torchvision
except ModuleNotFoundError:
raise ModuleNotFoundError("optional dependency torchvision not found!"
" please install it in your plenoptic environment "
"and restart the notebook kernel")
import torchvision.transforms as transforms
import torch.nn.functional as F
from torch import nn
import matplotlib.pyplot as plt
import pyrtools as pt
import plenoptic as po
%matplotlib inline
from plenoptic.simulate import SteerablePyramidFreq
from plenoptic.synthesize import Eigendistortion
from plenoptic.tools.data import to_numpy
dtype = torch.float32
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
import os
from tqdm.auto import tqdm
%load_ext autoreload
%autoreload 2
Introduction: Steerable Pyramid Wavelets
In this section we will:
Visualize the wavelets that produce steerable pyramid coefficients
Visualize the steerable pyramid decomposition for two example images
Provide some technical details about this steerable pyramid implementation and how it may differ from others
Visualizing Wavelets
Many modern computer vision algorithms employ convolution: a kernel slides across an image, taking inner products along the way. Thus the output of a convolution at a given location is the similarity between the kernel and the image content at that location, and therefore if your kernel has a low-spatial frequency it will function as a low-pass filter. Of course, low-pass filtering has a very simple interpretation in the fourier domain (simply attenuate the high spatial frequencies and leave low spatial frequencies unchanged). This particular scenario belies a more general fact: convolution in the spatial domain is mathematically equivalent to multiplication in the Fourier domain (a result known as the convolution theorem). Though both are equivalent there may be advantages to carrying out the operation in one domain or the other. This implementation of the steerable pyramid operates in the Fourier domain. For those that are interested, the first benefit of this implementation is that we have access to a perfectly invertible representation (i.e. the representation can be inverted to reconstruct the input perfectly and therefore, one can analyze how perturbations to the representation are visualized in the input space). Second, working in the Fourier domain, allows us to work with a complex-valued representation which can provide natural benefits like being able to construct quadrature pair filters etc. easily.
Because of this we don’t have direct access to a set of spatial filters for visualization, however we can generate the equivalent spatial filters by inverting a set of coefficients (i.e., output of the pyramid) constructed to contain zero everywhere except in the center of a single band. Below we do this to visualize the filters for a 3 scale 4 orientation steerable pyramid.
[2]:
order = 3
imsize = 64
pyr = SteerablePyramidFreq(height=3, image_shape=[imsize, imsize], order=order).to(device)
empty_image = torch.zeros((1, 1, imsize, imsize), dtype=dtype).to(device)
pyr_coeffs = pyr.forward(empty_image)
# insert a 1 in the center of each coefficient...
for k,v in pyr.pyr_size.items():
mid = (v[0]//2, v[1]//2)
pyr_coeffs[k][0, 0, mid[0], mid[1]] = 1
# ... and then reconstruct this dummy image to visualize the filter.
reconList = []
for k in pyr_coeffs.keys():
# we ignore the residual_highpass and residual_lowpass, since we're focusing on the filters here
if isinstance(k, tuple):
reconList.append(pyr.recon_pyr(pyr_coeffs, [k[0]], [k[1]]))
po.imshow(reconList, col_wrap=order+1, vrange='indep1', zoom=2);

We can see that this pyramid is representing a 3 scale 4 orientation decomposition: each row represents a single scale, which we can see because each filter is the same size. As the filter increases in size, we describe the scale as getting “coarser”, and its spatial frequency selectivity moves to lower and lower frequencies (conversely, smaller filters are operating at finer scales, with selectivity to higher spatial frequencies). In a given column, all filters have the same orientation: the first column is vertical, the third horizontal, and the other two diagonals.
Visualizing Filter Responses (Wavelet Coefficients)
Now let’s see what the steerable pyramid representation for images look like.
Like all models included in and compatible with plenoptic
, the included steerable pyramid operates on 4-dimensional tensors of shape (batch, channel, height, width)
. We are able to perform batch computations with the steerable pyramid implementation, analyzing each batch separately. Similarly, the pyramid is meant to operate on gray-scale images, and so channel > 1
will cause the pyramid to run independently on each channel (meaning each of the first two dimesions are treated
effectively as batch dimensions).
[3]:
im_batch = torch.cat([po.data.curie(), po.data.reptile_skin()], axis=0)
print(im_batch.shape)
po.imshow(im_batch)
order = 3
dim_im = 256
pyr = SteerablePyramidFreq(height=4, image_shape=[dim_im, dim_im], order=order).to(device)
pyr_coeffs = pyr(im_batch)
torch.Size([2, 1, 256, 256])

By default, the output of the pyramid is stored as a dictionary whose keys are either a string for the 'residual_lowpass'
and 'residual_highpass'
bands or a tuple of (scale_index, orientation_index)
. plenoptic
provides a convenience function, pyrshow
, to visualize the pyramid’s coefficients for each image and channel.
[4]:
print(pyr_coeffs.keys())
po.pyrshow(pyr_coeffs, zoom=0.5, batch_idx=0);
po.pyrshow(pyr_coeffs, zoom=0.5, batch_idx=1);
odict_keys(['residual_highpass', (0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2), (2, 3), (3, 0), (3, 1), (3, 2), (3, 3), 'residual_lowpass'])


For some applications, such as coarse-to-fine optimization procedures, it may be convenient to output a subset of the representation, including coefficients from only some scales. We do this by passing a scales
argument to the forward method (a list containing a subset of the values found in pyr.scales
):
[5]:
#get the 3rd scale
print(pyr.scales)
pyr_coeffs_scale0 = pyr(im_batch, scales=[2])
po.pyrshow(pyr_coeffs_scale0, zoom=2, batch_idx=0);
po.pyrshow(pyr_coeffs_scale0, zoom=2, batch_idx=1);
['residual_lowpass', 3, 2, 1, 0, 'residual_highpass']


The above pyramid was the real pyramid but in many applications we might want the full complex pyramid output ). This can be set using the is_complex
argument. When this is true, the pyramid uses a complex-valued filter, resulting in a complex-valued output. The real and imaginary components can be understood as the outputs of filters with identical scale and orientation, but different phases: the imaginary component is phase-shifted 90 degrees from the real (we refer to this matched pair of
filters as a “quadrature pair”). This can be useful if you wish to construct a representation that is phase-insensitive (as complex cells in the primary visual cortex are believed to be), which can be done by computing the amplitude / complex modulus (e.g., call torch.abs(x)
). po.simul.rectangular_to_polar
and po.simul.rectangular_to_polar_dict
provide convenience wrappers for this functionality.
See A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients for more technical details.
[6]:
order = 3
height = 3
pyr_complex = SteerablePyramidFreq(height=height, image_shape=[256,256], order=order, is_complex=True)
pyr_complex.to(device)
pyr_coeffs_complex = pyr_complex(im_batch)
[7]:
# the same visualization machinery works for complex pyramids; what is shown is the magnitude of the coefficients
po.pyrshow(pyr_coeffs_complex, zoom=0.5, batch_idx=0);
po.pyrshow(pyr_coeffs_complex, zoom=0.5, batch_idx=1);


Now that we have seen the basics of using the pyramid, it’s worth noting the following: an important property of the steerable pyramid is that it should respect the generalized parseval theorem (i.e. the energy of the pyramid coefficients should equal the energy of the original image). The matlabpyrtools and pyrtools versions of the SteerablePyramid DO NOT respect this, so in our
version, we have provided a fix that normalizes the FFTs such that energy is preserved. This is set using the tight_frame=True
when instantiating the pyramid; however, if you require matching the outputs to the matlabPyrTools or PyrTools versions, please note that you will need to set this argument to False
.
Putting the “Steer” in Steerable Pyramid
As we have seen, steerable pyramids decompose images into a fixed set of orientation bands (at several spatial scales). However given the responses at this fixed set of orientation bands, the pyramid coefficents for any arbitrary intermediate orientation can be calculated from a linear interpolation of the original bands. This property is known as “steerability.” Below we steer a set of coefficients through a series of angles and visualize how the represeted features rotate.
[8]:
# note that steering is currently only implemeted for real pyramids, so the `is_complex` argument must be False (as it is by default)
pyr = SteerablePyramidFreq(height=3, image_shape=[256,256], order=3, twidth=1).to(device)
coeffs = pyr(im_batch)
# play around with different scales! Coarser scales tend to make the steering a bit more obvious.
target_scale = 2
N_steer = 64
M = torch.zeros(1, 1, N_steer, 256//2**target_scale, 256//2**target_scale)
for i, steering_offset in enumerate(np.linspace(0, 1, N_steer)):
steer_angle = steering_offset * 2 * np.pi
steered_coeffs, steering_weights = pyr.steer_coeffs(coeffs, [steer_angle]) # (the steering coefficients are also returned by pyr.steer_coeffs steered_coeffs_ij = oig_coeffs_ij @ steering_weights)
M[0, 0, i] = steered_coeffs[(target_scale, 4)][0, 0] # we are always looking at the same band, but the steering angle changes
po.tools.convert_anim_to_html(po.animshow(M, framerate=6, repeat=True, zoom=2**target_scale))
[8]:
Example Application: Frontend for Convolutional Neural Network
Until now we have just seen how to use the Steerable Pyramid as a stand-alone fixed feature extractor, but what if we wanted to use it in a larger model, as a front-end for a deep CNN or other model? The steerable pyramid decomposition is qualitatively similar to the computations in primary visual cortex, so it stands to reason that using a steerable pyramid frontend might serve as an inductive bias that encourages subsequent layers to have more biological structure. In the literature it has been demonstrated that using a V1-like front end attached to a CNN trained on classification can lead to improvements in adversarial robustness Dapello et. al, 2020.
In this section we will demonstrate how the plenoptic steerable pyramid can be made compatible with standard deep learning architectures and use it as a frontend for a standard CNN.
Preliminaries
Most standard model architectures only accept channels with fixed shape, but each scale of the pyramid coefficients has a different shape (because each scale is downsampled by a factor of 2). In order to obtain an output amenable to downstream processing by standard torch nn modules, we have created an argument to the pyramid (downsample=False
) that does not downsample the frequency masks at each scale and thus maintains output feature maps that all have a fixed size. Once you have done
this, you can then convert the dictionary into a tensor of size (batch, channel, height, width)
so that it can easily be passed to a downstream nn.Module
. The details of how to do this are provided in the the convert_pyr_to_tensor
function within the SteerablePyramidFreq class. Let’s try this and look at the first image both in the downsampled and not downsampled versions:
[9]:
height = 3
order = 3
pyr_fixed = SteerablePyramidFreq(height=height, image_shape=[256,256], order=order, is_complex=True,
downsample=False, tight_frame=True).to(device)
pyr_coeffs_fixed, pyr_info = pyr_fixed.convert_pyr_to_tensor(pyr_fixed(im_batch), split_complex=False)
# we can also split the complex coefficients into real and imaginary parts as separate channels.
pyr_coeffs_split, _ = pyr_fixed.convert_pyr_to_tensor(pyr_fixed(im_batch), split_complex=True)
print(pyr_coeffs_split.shape, pyr_coeffs_split.dtype)
print(pyr_coeffs_fixed.shape, pyr_coeffs_fixed.dtype)
torch.Size([2, 26, 256, 256]) torch.float32
torch.Size([2, 14, 256, 256]) torch.complex64
We can see that in this complex pyramid with 4 scales and 3 orientations there will be 26 channels: 4 scales x 3 orientations x 2 (for real and imaginary featuremaps) + 2 (for the residual bands). NOTE: you can change what scales/residuals get included in this output tensor again using the scales
argument to the forward method.
In order to display the coefficients, we need to convert the tensor coefficients back to a dictionary. We can do this either by directly accessing the dictionary version (through the pyr_coeffs
attribute in the pyramid object) or by using the internal convert_tensor_to_pyr
function. We can check that these are equal.
[10]:
pyr_coeffs_fixed_1 = pyr_fixed(im_batch)
pyr_coeffs_fixed_2 = pyr_fixed.convert_tensor_to_pyr(pyr_coeffs_fixed, *pyr_info)
for k in pyr_coeffs_fixed_1.keys():
print(torch.allclose(pyr_coeffs_fixed_2[k], pyr_coeffs_fixed_1[k]))
True
True
True
True
True
True
True
True
True
True
True
True
True
True
/home/billbrod/Documents/plenoptic/src/plenoptic/simulate/canonical_computations/steerable_pyramid_freq.py:476: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at ../aten/src/ATen/native/Copy.cpp:299.)
band = pyr_tensor[:,i,...].unsqueeze(1).type(torch.float)
We can now plot the coefficients for the not downsampled version (pyr_coeffs_complex
from the last section) and the downsampled version pyr_coeffs_fixed_1
from above and see how they compare visually.
[11]:
po.pyrshow(pyr_coeffs_complex, zoom=0.5);
po.pyrshow(pyr_coeffs_fixed_1, zoom=0.5);


We can see that the not downsampled version maintains the same features as the original pyramid, but with fixed feature maps that have spatial dimensions equal to the original image (256x256). However, the pixel magnitudes in the bands are different due to the fact that we are not downsampling in the frequency domain anymore. This can equivalently be thought of as the inverse operation of blurring and downsampling. Therefore the upsampled versions of each scale are not simply zero interpolated versions of the downsampled versions and thus the pixel values are non-trivially changed. However, the energy in each band should be preserved between the two pyramids and we can check this by computing the energy in each band for the two pyramids and checking if they are the same.
[12]:
# the following passes with tight_frame=True or tight_frame=False, either way.
pyr_not_downsample = SteerablePyramidFreq(height=height,image_shape=[256,256],order=order,is_complex = False,twidth=1, downsample=False, tight_frame=False)
pyr_not_downsample.to(device)
pyr_downsample = SteerablePyramidFreq(height=height,image_shape=[256,256],order=order,is_complex = False,twidth=1, downsample=True, tight_frame=False)
pyr_downsample.to(device)
pyr_coeffs_downsample = pyr_downsample(im_batch.to(device))
pyr_coeffs_not_downsample = pyr_not_downsample(im_batch.to(device))
for i in range(len(pyr_coeffs_downsample.keys())):
k = list(pyr_coeffs_downsample.keys())[i]
v1 = to_numpy(pyr_coeffs_downsample[k])
v2 = to_numpy(pyr_coeffs_not_downsample[k])
v1 = v1.squeeze()
v2 = v2.squeeze()
#check if energies match in each band between downsampled and fixed size pyramid responses
print(np.allclose(np.sum(np.abs(v1)**2), np.sum(np.abs(v2)**2), rtol=1e-4, atol=1e-4))
def check_parseval(im ,coeff, rtol=1e-4, atol=0):
'''
function that checks if the pyramid is parseval, i.e. energy of coeffs is
the same as the energy in the original image.
Args:
input image: image stimulus as torch.Tensor
coeff: dictionary of torch tensors corresponding to each band
'''
total_band_energy = 0
im_energy = im.abs().square().sum().numpy()
for k,v in coeff.items():
band = coeff[k]
print(band.abs().square().sum().numpy())
total_band_energy += band.abs().square().sum().numpy()
np.testing.assert_allclose(total_band_energy, im_energy, rtol=rtol, atol=atol)
True
True
True
True
True
True
True
True
True
True
True
True
True
True
Model Training
We are now ready to demonstrate how the steerable pyramid can be used as a fixed frontend for further stages of (learnable) processing!
[13]:
# First we define/download the dataset
train_set = torchvision.datasets.FashionMNIST(
# change this line to wherever you'd like to download the FashionMNIST dataset
root = '../data',
train = True,
download = True,
transform = transforms.Compose([
transforms.ToTensor()
])
)
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/train-images-idx3-ubyte.gz
100%|█████████████████████████████████████████████████████████████████| 26421880/26421880 [00:25<00:00, 1021172.29it/s]
Extracting /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/train-images-idx3-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/train-labels-idx1-ubyte.gz
100%|████████████████████████████████████████████████████████████████████████| 29515/29515 [00:00<00:00, 316479.83it/s]
Extracting /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/train-labels-idx1-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/t10k-images-idx3-ubyte.gz
100%|███████████████████████████████████████████████████████████████████| 4422102/4422102 [00:01<00:00, 3325433.77it/s]
Extracting /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
100%|█████████████████████████████████████████████████████████████████████████| 5148/5148 [00:00<00:00, 6730759.66it/s]
Extracting /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to /home/billbrod/Documents/plenoptic/data/images/FashionMNIST/raw
[14]:
# Define a simple model: SteerPyr --> ConvLayer --> Fully Connected
class PyrConvFull(nn.Module):
def __init__(self, imshape, order, scales, exclude=[], is_complex=True):
super().__init__()
self.imshape = imshape
self.order = order
self.scales = scales
self.output_dim = 20 # number of channels in the convolutional block
self.kernel_size = 6
self.is_complex = is_complex
self.rect = nn.ReLU()
self.pyr = SteerablePyramidFreq(height=self.scales,image_shape=self.imshape,
order=self.order,is_complex = self.is_complex,twidth=1, downsample=False)
# num_channels = num_scales * num_orientations (+ 2 residual bands) (* 2 if complex)
channels_per = 2 if self.is_complex else 1
self.pyr_channels = ((self.order + 1) * self.scales + 2) * channels_per
self.conv = nn.Conv2d(in_channels=self.pyr_channels, kernel_size=self.kernel_size,
out_channels=self.output_dim, stride=2)
# the input ndim here has to do with the dimensionality of self.conv's output, so will have to change
# if kernel_size or output_dim do
self.fc = nn.Linear(self.output_dim * 12**2, 10)
def forward(self, x):
out = self.pyr(x)
out, _ = self.pyr.convert_pyr_to_tensor(out)
# case handling for real v. complex forward passes
if self.is_complex:
# split to real and imaginary so nonlinearities make sense
out_re = self.rect(out.imag)
out_im = self.rect(out.real)
# concatenate
out = torch.cat([out_re, out_im], dim=1)
else:
out = self.rect(out)
out = self.conv(out)
out = self.rect(out)
out = out.view(out.shape[0], -1) # reshape for linear layer
out = self.fc(out)
return out
[15]:
# Training Pyramid Model
model_pyr = PyrConvFull([28, 28], order=4, scales=2, is_complex=False)
loader = torch.utils.data.DataLoader(train_set, batch_size = 50)
optimizer = torch.optim.Adam(model_pyr.parameters(), lr=1e-3)
epoch = 2
losses = []
fracts_correct = []
for e in range(epoch):
for batch in tqdm(loader):
images = batch[0]
labels = batch[1]
preds = model_pyr(images)
loss = F.cross_entropy(preds, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
losses.append(loss.item())
n_correct = preds.argmax(dim=1).eq(labels).sum().item()
fracts_correct.append(n_correct / 50)
fig, axs = plt.subplots(1, 2, figsize=(10, 5))
axs[0].plot(losses)
axs[0].set_xlabel('Iteration')
axs[0].set_ylabel('Cross Entropy Loss')
axs[1].plot(fracts_correct)
axs[1].set_xlabel('Iteration')
axs[1].set_ylabel('Classification Performance')
[15]:
Text(0, 0.5, 'Classification Performance')

The steerable pyramid can be smoothly integrated with standard torch modules and autograd, so the impact of including such a frontend could be probed using the sythesis techniques provided by Plenoptic.
Perceptual distance
Run notebook online with Binder:
The easiest way to measure the difference between two images is by computing the mean square error (MSE), but it does not match the perceptual distance judged by humans. Several perceptual distance functions have been developed to better match human perception. This tutorial introduces three perceptual distance functions available in plenoptic
package: SSIM (structural similarity), MS-SSIM (multiscale structural similarity) and NLPD (normalized Laplacian pyramid distance).
References
SSIM: Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612.
MS-SSIM: Wang, Z., Simoncelli, E. P., & Bovik, A. C. (2003, November). Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003 (Vol. 2, pp. 1398-1402). IEEE.
NLPD: Laparra, V., Ballé, J., Berardino, A., & Simoncelli, E. P. (2016). Perceptual image quality assessment using a normalized Laplacian pyramid. Electronic Imaging, 2016(16), 1-6.
[1]:
import os
import io
import imageio
import plenoptic as po
import numpy as np
from scipy.stats import pearsonr, spearmanr
import matplotlib.pyplot as plt
import torch
from PIL import Image
SSIM (structural similarity)
The idea of SSIM index is to decompose the difference between two images into three components: luminance, contrast and structure. For two small image patches \(\mathbf{x}\) and \(\mathbf{y}\), these three components of difference are defined as:
where \(\mu_x\) and \(\mu_y\) are the mean of \(\mathbf{x}\) and \(\mathbf{y}\), \(\sigma_x\) and \(\sigma_y\) are the standard deviation of \(\mathbf{x}\) and \(\mathbf{y}\), and \(\sigma_{xy}\) is the covariance between \(\mathbf{x}\) and \(\mathbf{y}\). And \(C_1, C_2, C_3\) are small constants. If we ignore the small constants, we can see that the luminance term \(l(\mathbf{x}, \mathbf{y})\) is a scale-invariant similarity measurement between \(\mu_x\) and \(\mu_y\), and the contrast term \(c(\mathbf{x}, \mathbf{y})\) is such a measurement between \(\sigma_x\) and \(\sigma_y\). The structural term \(s(\mathbf{x}, \mathbf{y})\) is the correlation coefficient between \(\mathbf{x}\) and \(\mathbf{y}\), which is invariant to addition and multiplication of constants on \(\mathbf{x}\) or \(\mathbf{y}\).
Local SSIM between two small image patches \(\mathbf{x}\) and \(\mathbf{y}\) is defined as (let \(C_3 = C_2 / 2\)):
The local SSIM value \(d(\mathbf{x}, \mathbf{y}) = 1\) means the two patches are identical and \(d(\mathbf{x}, \mathbf{y}) = 0\) means they’re very different. When the two patches are negatively correlated, \(d(\mathbf{x}, \mathbf{y})\) can be negative. The local SSIM value is bounded between -1 and 1.
For two full images \(\mathbf{X}, \mathbf{Y}\), an SSIM map is obtained by computing the local SSIM value \(d\) across the whole image. For each position on the images, instead of using a square patch centered on it, a circular-symmeric Gaussian kernel is used to compute the local mean, standard deviation and covariance terms \(\mu_{X,i}, \mu_{Y,i}, \sigma_{X,i}, \sigma_{Y,i}, \sigma_{XY,i}\), where \(i\) is the pixel index. In this way we can obtain an SSIM map \(d_i(\mathbf{X}, \mathbf{Y})\). The values in the SSIM map are averaged to generate a single number, which is the SSIM index:
where \(N\) is the number of pixels of the image. The SSIM index is also bounded between -1 and 1. In plenoptic
, the SSIM map is computed by the function po.metric.ssim_map
, and the SSIM index itself is computed by the function po.metric.ssim
. For more information, see the original paper:
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612.
Understanding SSIM
We demonstrate the effectiveness of SSIM by generating five different types of distortions (contrast stretching, mean shifting, JPEG compression, blurring, and salt-pepper noise) with the same MSE, and computing their SSIM values.
[2]:
import tempfile
def add_jpeg_artifact(img, quality):
# need to convert this back to 2d 8-bit int for writing out as jpg
img = po.to_numpy(img.squeeze() * 255).astype(np.uint8)
# write to a temporary file
with tempfile.NamedTemporaryFile(suffix='.jpg') as tmp:
imageio.imwrite(tmp.name, img, quality=quality)
img = po.load_images(tmp.name)
return img
def add_saltpepper_noise(img, threshold):
po.tools.set_seed(0)
img_saltpepper = img.clone()
for i in range(img.shape[-2]):
for j in range(img.shape[-1]):
x = np.random.rand()
if x < threshold:
img_saltpepper[..., i, j] = 0
elif x > 1 - threshold:
img_saltpepper[..., i, j] = 1
np.random.seed(None)
return img_saltpepper
def get_distorted_images():
img = po.data.einstein()
img_contrast = torch.clip(img + 0.20515 * (2 * img - 1), min=0, max=1)
img_mean = torch.clip(img + 0.05983, min=0, max=1)
img_jpeg = add_jpeg_artifact(img, quality=4)
img_blur = po.simul.Gaussian(5, std=2.68)(img)
img_saltpepper = add_saltpepper_noise(img, threshold=0.00651)
img_distorted = torch.cat([img, img_contrast, img_mean, img_jpeg, img_blur, img_saltpepper], axis=0)
return img_distorted
[3]:
img_distorted = get_distorted_images()
mse_values = torch.square(img_distorted - img_distorted[0]).mean(dim=(1, 2, 3))
ssim_values = po.metric.ssim(img_distorted, img_distorted[[0]])[:, 0]
names = ["Original image", "Contrast change", "Mean shift", "JPEG artifact", "Gaussian blur", "Salt-and-pepper noise"]
titles = [f"{names[i]}\nMSE={mse_values[i]:.3e}, SSIM={ssim_values[i]:.4f}" for i in range(6)]
po.imshow(img_distorted, vrange="auto", title=titles, col_wrap=3);
/home/billbrod/micromamba/envs/plenoptic/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]

We can see that the SSIM index matches human perception better than MSE.
While the scalar SSIM index is a concise summary, the SSIM map offers richer information about where perceptual discrepancy is located in the image. Here, we visualize the SSIM map of a JPEG compressed image, and also show the absolute error (absolute value of the difference) for comparison. In both maps, darker means more different.
[4]:
def get_demo_images():
img = po.data.parrot(as_gray=True)
img_jpeg = add_jpeg_artifact(img, quality=6)
ssim_map_small = po.metric.ssim_map(img, img_jpeg)
ssim_map = torch.ones_like(img)
ssim_map[:, :, 5:-5, 5:-5] = ssim_map_small
abs_map = 1 - torch.abs(img - img_jpeg)
img_demo = torch.cat([img, img_jpeg, ssim_map, abs_map], dim=0).cpu()
return img_demo
[5]:
img_demo = get_demo_images()
titles = ["Original", "JPEG artifact", "SSIM map", "Absolute error"]
po.imshow(img_demo, title=titles);

You can judge whether the SSIM map captures the location of perceptual discrepancy better than absolute error.
MS-SSIM (multiscale structural similarity)
MS-SSIM computes SSIM on multiple scales of the images. To do this, the two images \(\mathbf{X}\) and \(\mathbf{Y}\) are recursively blurred and downsampled by a factor of 2 to produce two sequences of images: \(\mathbf{X}_1, \cdots, \mathbf{X}_M\) and \(\mathbf{Y}_1, \cdots, \mathbf{Y}_M\), where \(\mathbf{X}_1 = \mathbf{X}\), and \(\mathbf{X}_{i+1}\) is obtained by blurring and downsampling \(\mathbf{X}_{i}\) (same for \(\mathbf{Y}\)). Such a sequence is called a Gaussian pyramid. We define a contrast-structural index that does not include luminance component:
The MS-SSIM index is defined as:
where \(\gamma_1, \cdots, \gamma_M\) are exponents that determine the relative importance of different scales. They are determined by a human psychophysics experiment and are constrained to sum to 1. When \(M=1\), the MS-SSIM index is the same as the SSIM index. In the standard implementation of MS-SSIM, \(M = 5\). In plenoptic
, the MS-SSIM index is computed by the function po.metric.ms_ssim
. For more information, see the original paper:
Wang, Z., Simoncelli, E. P., & Bovik, A. C. (2003, November). Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003 (Vol. 2, pp. 1398-1402). IEEE.
Here we use the same distortions on Einstein image to demonstrate MS-SSIM:
[6]:
msssim_values = po.metric.ms_ssim(img_distorted, img_distorted[[0]])[:, 0]
names = ["Original image", "Contrast change", "Mean shift", "JPEG artifact", "Gaussian blur", "Salt-and-pepper noise"]
titles = [f"{names[i]}\nMSE={mse_values[i]:.3e}, MS-SSIM={msssim_values[i]:.3f}" for i in range(6)]
po.imshow(img_distorted, vrange="auto", title=titles, col_wrap=3);

NLPD (normalized Laplacian pyramid distance)
Similar to MS-SSIM, the NLPD is also based on a multiscale representation of the images. Also similar to MS-SSIM, the idea of NLPD is also to separate out the effects of luminance and contrast difference. Unlike MS-SSIM, the NLPD directly performs luminance subtraction and contrast normalization on each scale, and then computes simple square difference. The NLPD uses the Laplacian pyramid for luminance subtraction. Given a Gaussian pyramid \(\mathbf{X}_1, \cdots, \mathbf{X}_M\), for
\(k=1, \cdots, M - 1\), we upsample and blur \(\mathbf{X}_{k+1}\) to produce \(\mathbf{\hat{X}}_k\), which is a blurry version of \(\mathbf{X}_k\), and let \(\mathbf{X}'_k = \mathbf{X}_k - \mathbf{\hat{X}}_k\). Define \(\mathbf{X}'_M = \mathbf{X}_M\), and we get the Laplacian pyramid \(\mathbf{X}'_1, \cdots, \mathbf{X}'_M\). In plenoptic
, the Laplacian pyramid is implemented by po.simul.LaplacianPyramid
.
The contrast normalization is achieved by dividing by a local estimation of amplitude:
where \(N(i)\) is the neighborhood of pixel \(i\) which does not include \(i\) itself, and the parameters \(\sigma_k\) and \(\mathbf{p}_k\) are learned from an image dataset:
Note that this learning is performed on the clean images only, without access to the corruption type or human phychophysics data. The sequence \(\mathbf{X}''_1, \cdots, \mathbf{X}''_M\) is the normalized Laplacian pyramid. The same procudure is done for \(\mathbf{Y}\). The NLPD is defined as:
where \(N_k\) is the number of pixels of \(\mathbf{X}''_k\). In plenoptic
, the NLPD is computed by the function po.metric.nlpd
. For more information, see the original paper:
Laparra, V., Ballé, J., Berardino, A., & Simoncelli, E. P. (2016). Perceptual image quality assessment using a normalized Laplacian pyramid. Electronic Imaging, 2016(16), 1-6.
Here we use the same distortions on Einstein image to demonstrate NLPD:
[7]:
nlpd_values = po.metric.nlpd(img_distorted, img_distorted[[0]])[:, 0]
names = ["Original image", "Contrast change", "Mean shift", "JPEG artifact", "Gaussian blur", "Salt-and-pepper noise"]
titles = [f"{names[i]}\nMSE={mse_values[i]:.3e}, NLPD={nlpd_values[i]:.4f}" for i in range(6)]
po.imshow(img_distorted, vrange="auto", title=titles, col_wrap=3);

Usage
The basic usage of ssim
, ms_ssim
and nlpd
in the po.metric
module is the same: they take two arguments that are images to be compared, whose shapes should be in the format (batch, channel, height, width)
. All these functions are designed for grayscale images, so the channel dimension is treated as another batch dimension. The height and width of the two arguments should be the same, and the batch and channel sizes of the two arguments should be broadcastable. The broadcasting
is already demonstrated in the examples of SSIM, MS-SSIM and NLPD that use the Einstein image.
SSIM, MS-SSIM and NLPD are not scale-invariant. The input images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).
[8]:
# Take SSIM as an example here. The images in img_demo have a range of [0, 1].
val1 = po.metric.ssim(img_demo[[0]], img_demo[[1]])
val2 = po.metric.ssim(img_demo[[0]] * 255, img_demo[[1]] * 255) # This produces a wrong result and triggers a warning: Image range falls outside [0, 1].
print(f"True SSIM: {float(val1):.4f}, rescaled image SSIM: {float(val2):.4f}")
True SSIM: 0.7703, rescaled image SSIM: 0.4048
/home/billbrod/Documents/plenoptic/src/plenoptic/metric/perceptual_distance.py:42: UserWarning: Image range falls outside [0, 1]. img1: tensor([ 14., 255.]), img2: tensor([ 0., 255.]). Continuing anyway...
warnings.warn("Image range falls outside [0, 1]."
Comparison of performance
The performance of these perceptual distance metrics can be measured by the correlation with human psychophysics data: the TID2013 dataset consists of 3000 different distorted images (25 clean images x 24 types of distortions x 5 levels of distortions), each with its own mean opinion score (MOS; the perceived quality of the distorted image). Higher MOS means a smaller distance from its corresponding clean image. The TID2013 dataset is described in the following paper:
Ponomarenko, N., Jin, L., Ieremeiev, O., Lukin, V., Egiazarian, K., Astola, J., … & Kuo, C. C. J. (2015). Image database TID2013: Peculiarities, results and perspectives. Signal processing: Image communication, 30, 57-77.
Since both SSIM and MS-SSIM have higher values for less different image pairs, and are maximized at 1 for identical images, we need to convert them to distances as 1-SSIM and 1-(MS-SSIM). Then we will plot MOS against the three metrics: 1-SSIM, 1-(MS-SSIM) and NLPD, as well as the baseline RMSE (square root of mean square error). We will also measure the correlation.
To execute this part of the notebook, the TID2013 dataset needs to be downloaded. In order to do so, we use an optional dependency, pooch. If the following raises an ImportError or ModuleNotFoundError then install pooch in your plenoptic environment and restart your kernel. Note that the dataset is fairly large, about 1GB.
[11]:
def get_tid2013_data():
folder = po.data.fetch_data('tid2013.tar.gz')
reference_images = torch.zeros([25, 1, 384, 512])
distorted_images = torch.zeros([25, 24, 5, 1, 384, 512])
reference_filemap = {s.lower(): s for s in os.listdir(folder / "reference_images")}
distorted_filemap = {s.lower(): s for s in os.listdir(folder / "distorted_images")}
for i in range(25):
reference_filename = reference_filemap[f"i{i+1:02d}.bmp"]
reference_images[i] = torch.tensor(np.asarray(Image.open(
folder / "reference_images" / reference_filename).convert("L"))) / 255
for j in range(24):
for k in range(5):
distorted_filename = distorted_filemap[f"i{i+1:02d}_{j+1:02d}_{k+1}.bmp"]
distorted_images[i, j, k] = torch.tensor(np.asarray(Image.open(
folder / "distorted_images" / distorted_filename).convert("L"))) / 255
distorted_images = distorted_images[:, [0] + list(range(2, 17)) + list(range(18, 24))] # Remove color distortions
with open(folder/ "mos.txt", "r", encoding="utf-8") as g:
mos_values = list(map(float, g.readlines()))
mos_values = np.array(mos_values).reshape([25, 24, 5])
mos_values = mos_values[:, [0] + list(range(2, 17)) + list(range(18, 24))] # Remove color distortions
return reference_images, distorted_images, mos_values
def correlate_with_tid(func_list, name_list):
reference_images, distorted_images, mos_values = get_tid2013_data()
distance = torch.zeros([len(func_list), 25, 22, 5])
for i, func in enumerate(func_list):
for j in range(25):
distance[i, j] = func(reference_images[[j]], distorted_images[j].flatten(0, 1)).reshape(22, 5)
plot_size = int(np.ceil(np.sqrt(len(func_list))))
fig, axs = plt.subplots(plot_size, plot_size, squeeze=False, figsize=(plot_size * 6, plot_size * 6))
axs = axs.flatten()
edgecolor_list = ["m", "c", "k", "g", "r"]
facecolor_list = [None, "none", "none", None, "none"]
shape_list = ["x", "s", "o", "*", "^"]
distortion_names = ["Additive Gaussian noise",
"Spatially correlated noise",
"Masked noise",
"High frequency noise",
"Impulse noise",
"Quantization noise",
"Gaussian blur",
"Image denoising",
"JPEG compression",
"JPEG2000 compression",
"JPEG transmission errors",
"JPEG2000 transmission errors",
"Non eccentricity pattern noise",
"Local block-wise distortions of different intensity",
"Mean shift (intensity shift)",
"Contrast change",
"Multiplicative Gaussian noise",
"Comfort noise",
"Lossy compression of noisy images",
"Image color quantization with dither",
"Chromatic aberrations",
"Sparse sampling and reconstruction"]
for i, name in enumerate(name_list):
for j in range(22):
edgecolor = edgecolor_list[j % 5]
facecolor = facecolor_list[j // 5]
if facecolor is None:
facecolor = edgecolor
edgecolor = None
axs[i].scatter(distance[i, :, j].flatten(), mos_values[:, j].flatten(), s=20,
edgecolors=edgecolor, facecolors=facecolor,
marker=shape_list[j // 5], label=distortion_names[j])
pearsonr_value = pearsonr(-mos_values.flatten(), distance[i].flatten())[0]
spearmanr_value = spearmanr(-mos_values.flatten(), distance[i].flatten())[0]
axs[i].set_title(
f"pearson {pearsonr_value:.4f}, spearman {spearmanr_value:.4f}")
axs[i].set_xlabel(name)
axs[i].set_ylabel("MOS")
lines, labels = axs[0].get_legend_handles_labels()
fig.legend(lines, labels, loc="lower center", bbox_to_anchor=(0.5, 1.0))
plt.tight_layout()
plt.show()
[12]:
def rmse(img1, img2):
return torch.sqrt(torch.square(img1 - img2).mean(dim=(-2, -1)))
def one_minus_ssim(img1, img2):
return 1 - po.metric.ssim(img1, img2)
def one_minus_msssim(img1, img2):
return 1 - po.metric.ms_ssim(img1, img2)
# This takes some minutes to run
correlate_with_tid(func_list=[rmse, one_minus_ssim, one_minus_msssim, po.metric.nlpd], name_list=["RMSE", "1 - SSIM", "1 - (MS-SSIM)", "NLPD"])

Each point in the figures is a distorted image, and the color/shape indicates the distortion type. The goodness of the perceptual distance metrics can be qualitatively assessed by looking at how well the points follow a monotonic function, and how straight this monotonic function is. We can see that points for RMSE and 1-SSIM are more scattered than 1-(MS-SSIM) and NLPD. The points for NLPD follows a much straighter line than other methods. The points for RMSE have outliers belonging to certain distortion types, notably mean shift and contrast change, while all three perceptual distance metrics are able to handle them better.
For a quantitative comparison, we calculate the Pearson’s and Spearman’s correlation coefficient between MOS and each perceptual distance metric (shown above the figures). Pearson’s correlation measures linear relationship, while Spearman’s correlation allows a nonlinear relationship since it only depends on ranking. We can see that the performance of the metrics, as measured by the correlation coefficients, is: NLPD > MS-SSIM > SSIM > RMSE.
[1]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import torch
import plenoptic as po
import scipy.io as sio
import os
import os.path as op
import einops
import glob
import math
import pyrtools as pt
from tqdm import tqdm
from PIL import Image
%load_ext autoreload
%autoreload
# We need to download some additional images for this notebook. In order to do so,
# we use an optional dependency, pooch. If the following raises an ImportError or ModuleNotFoundError
# then install pooch in your plenoptic environment and restart your kernel.
DATA_PATH = po.data.fetch_data('portilla_simoncelli_images.tar.gz')
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
# set seed for reproducibility
po.tools.set_seed(1)
/mnt/home/wbroderick/miniconda3/envs/plenoptic/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
[2]:
# These variables control how long metamer synthesis runs for. The values present here will result in completed synthesis,
# but you may want to decrease these numbers if you're on a machine with limited resources.
short_synth_max_iter = 1000
long_synth_max_iter = 3000
longest_synth_max_iter = 4000
Portilla-Simoncelli Texture Metamer
In this tutorial we will aim to replicate Portilla & Simoncelli (1999). The tutorial is broken into the following parts:
Introduce the concept of a Visual Texture.
How to synthesize metamers for the Portilla & Simoncelli texture model.
Demonstrate the importance of different classes of statistics.
Example syntheses from different classes of textures (e.g., artificial, Julesz, pseudoperiodic, etc.)
Extrapolation and Mixtures: Applying texture synthesis to more complex texture problems.
Some model limitations.
List of notable differences between the MATLAB and python implementations of the Portilla Simoncelli texture model and texture synthesis.
Note that this notebook takes a long time to run (roughly an hour with a GPU, several hours without), because of all the metamers that are synthesized.
1. What is a visual texture?
The simplest definition is a repeating visual pattern. Textures encompass a wide variety of images, including natural patterns such as bark or fur, artificial ones such as brick, and computer-generated ones such as the Julesz patterns (Julesz 1978, Yellot 1993). Below we load some examples.
The Portilla-Simoncelli model was developed to measure the statistical properties of visual textures. Metamer synthesis was used (and can be used) in conjunction with the Portilla-Simoncelli texture model to demonstrate the necessity of different properties of the visual texture. We will use some of these example textures to demonstrate aspects of the Portilla Simoncelli model.
[3]:
# Load and display a set of visual textures
def display_images(im_files, title=None):
images = po.tools.load_images(im_files)
fig = po.imshow(images, col_wrap=4, title=None)
if title is not None:
fig.suptitle(title, y=1.05)
natural = ['3a','6a','8a','14b','15c','15d','15e','15f','16c','16b','16a']
artificial = ['4a','4b','14a','16e','14e','14c','5a']
hand_drawn = ['5b','13a','13b','13c','13d']
im_files = [DATA_PATH / f'fig{num}.jpg' for num in natural]
display_images(im_files, "Natural textures")

[4]:
im_files = [DATA_PATH / f'fig{num}.jpg' for num in artificial]
display_images(im_files, 'Articial textures')

[5]:
im_files = [DATA_PATH / f'fig{num}.jpg' for num in hand_drawn]
display_images(im_files, 'Hand-drawn / computer-generated textures')

2. How to generate Portilla-Simoncelli Metamers
2.1 A quick reminder of what metamers are and why we are calculating them.
The primary reason that the original Portilla-Simoncelli paper developed the metamer procedure was to assess whether the model’s understanding of textures matches that of humans. While developing the model, the authors originally evaluated it by performing texture classification on a then-standard dataset (i.e., “is this a piece of fur or a patch of grass?”). The model aced the test, with 100% accuracy. After an initial moment of elation, the authors decided to double-check and performed the same evaluation with a far simpler the model, which used the steerable pyramid to compute oriented energy (the first stage of the model described here). That model also classified the textures with 100% accuracy. The authors interpreted this as their evaluation being too easy, and sought a method that would allow them to determine whether their model better matched human texture perception.
In the metamer paradigm they eventually arrived at, the authors generated model metamers: images with different pixel values but (near-)identical texture model outputs. They then evaluated whether these images belonged to the same texture class: does this model metamer of a basket also look like a basket, or does it look like something else? Importantly, they were not evaluating whether the images were indistinguishable, but whether they belonged to the same texture family. This paradigm thus tests whether the model is capturing important information about how humans understand and group textures.
2.2 How do we use the plenoptic package to generate Portilla-Simoncelli Texture Metamers?
Generating a metamer starts with a target image:
[6]:
img = po.tools.load_images(DATA_PATH / 'fig4a.jpg')
po.imshow(img);

Below we have an instance of the PortillaSimoncelli model with default parameters:
n_scales=4
, The number of scales in the steerable pyramid underlying the model.n_orientations=4
, The number of orientations in the steerable pyramid.spatial_corr_width=9
, The size of the window used to calculate the correlations across steerable pyramid bands.
Running the model on an image will return a tensor of numbers summarizing the “texturiness” of that image, which we refer to as the model’s representation. These statistics are measurements of different properties that the authors considered relevant to a texture’s appearance (where a texture is defined above), and capture some of the repeating properties of these types of images. Section 3 of this notebook explores those statistics and how they relate to texture properties.
When the model representation of two images match, the model considers the two images identical and we say that those two images are model metamers. Synthesizing a novel image that matches the representation of some arbitrary input is the goal of the Metamer
class.
[7]:
n=img.shape[-1]
model = po.simul.PortillaSimoncelli([n,n])
stats = model(img)
print(stats)
tensor([[[ 0.4350, 0.0407, 0.1622, ..., -0.0078, -0.2282, 0.0023]]])
To use Metamer
, simply initialize it with the target image and the model, then call .synthesize()
. By setting store_progress=True
, we update a variety of attributes (all of which start with saved_
) on each iteration so we can later examine, for example, the synthesized image over time. Let’s quickly run it for just 10 iterations to see how it works.
[8]:
met = po.synth.Metamer(img, model)
met.synthesize(store_progress=True, max_iter=10)
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
warnings.warn(
100%|██████████| 10/10 [00:01<00:00, 9.83it/s, loss=4.5063e-02, learning_rate=0.01, gradient_norm=1.6559e-02, pixel_change_norm=1.2805e+00]
We can then call the plot_synthesis_status
method to see how things are doing. The image on the left shows the metamer at this moment in synthesis, while the center plot shows the loss over time, with the red dot pointing out the current loss, and the rightmost plot shows the representation error. For the texture model, we plot the difference in representations split up across the different category of statistics (which we’ll describe in more detail later).
[9]:
# representation_error plot has three subplots, so we increase its relative width
po.synth.metamer.plot_synthesis_status(met, width_ratios={'plot_representation_error': 3.1});
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/display.py:950: UserWarning: ax is not None, so we're ignoring figsize...
warnings.warn("ax is not None, so we're ignoring figsize...")

2.3 Portilla-Simoncelli Texture Model Metamers
This section will show a successful texture synthesis for this wicker basket texture:
[10]:
po.imshow(img);

In the next block we will actually generate a metamer using the PortillaSimoncelli model, setting the following parameters for synthesis: max_iter
, store_progress
,coarse_to_fine
, and coarse_to_fine_kwargs
.
max_iter=1000
puts an upper bound (of 1000) on the number of iterations that the optimization will run.store_progress=True
tells the metamer class to store the progress of the metamer synthesis processcoarse_to_fine='together'
activates the coarse_to_fine functionality. With this mode turned on the metamer synthesis optimizes the image for the statistics associated with the low spatial frequency bands first, adding subsequent bands afterctf_iters_to_check
iterations.
It takes about 50s to run 100 iterations on my laptop. And it takes hundreds of iterations to get convergence. So you’ll have to wait a few minutes to generate the texture metamer.
Note: we initialize synthesis with im_init
, an initial uniform noise image with range mean(target_signal)+[-.05,.05]
. Initial images with uniform random noise covering the full pixel domain [0,1]
(which is the default choice for Metamer) don’t result in the very best metamers. With the full range initial image, the optimization seems to get stuck.
[11]:
# send image and PS model to GPU, if available. then im_init and Metamer will also use GPU
img = img.to(DEVICE)
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
im_init = (torch.rand_like(img)-.5) * .1 + img.mean();
met = po.synth.MetamerCTF(img, model, loss_function=po.tools.optim.l2_norm, initial_image=im_init,
coarse_to_fine='together')
o=met.synthesize(
max_iter=short_synth_max_iter,
store_progress=True,
# setting change_scale_criterion=None means that we change scales every ctf_iters_to_check,
# see the metamer notebook for details.
change_scale_criterion=None,
ctf_iters_to_check=7
)
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
warnings.warn(
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:211: UserWarning: Validating whether model can work with coarse-to-fine synthesis -- this can take a while!
warnings.warn("Validating whether model can work with coarse-to-fine synthesis -- this can take a while!")
73%|███████▎ | 734/1000 [00:31<00:11, 23.05it/s, loss=8.7390e-02, learning_rate=0.01, gradient_norm=7.7326e-01, pixel_change_norm=1.6338e-01, current_scale=all, current_scale_loss=8.7390e-02] /mnt/home/wbroderick/plenoptic/src/plenoptic/synthesize/metamer.py:661: UserWarning: Loss has converged, stopping synthesis
warnings.warn("Loss has converged, stopping synthesis")
73%|███████▎ | 734/1000 [00:31<00:11, 23.14it/s, loss=8.7390e-02, learning_rate=0.01, gradient_norm=7.7326e-01, pixel_change_norm=1.6338e-01, current_scale=all, current_scale_loss=8.7390e-02]
Now we can visualize the output of the synthesis optimization. First we compare the Target image and the Synthesized image side-by-side. We can see that they appear perceptually similar — that is, for this texture image, matching the Portilla-Simoncelli texture stats gives you an image that the human visual system also considers similar.
[12]:
po.imshow([met.image, met.metamer], title=['Target image', 'Synthesized metamer'], vrange='auto1');

And to further visualize the result we can plot: the synthesized image, the synthesis loss over time, and the final model output error: model(target image) - model(synthesized image)
.
We can see the synthesized texture on the leftmost plot. The overall synthesis error decreases over the synthesis iterations (subplot 2). The remaining plots show us the error broken out by the different texture statistics that we will go over in the next section.
[13]:
po.synth.metamer.plot_synthesis_status(met, width_ratios={'plot_representation_error': 3.1});

[14]:
# For the remainder of the notebook we will use this helper function to
# run synthesis so that the cells are a bit less busy.
# Be sure to run this cell.
def run_synthesis(img, model, im_init=None):
r""" Performs synthesis with the full Portilla-Simoncelli model.
Parameters
----------
img : Tensor
A tensor containing an img.
model :
A model to constrain synthesis.
im_init: Tensor
A tensor to start image synthesis.
Returns
-------
met: Metamer
Metamer from the full Portilla-Simoncelli Model
"""
if im_init is None:
im_init = torch.rand_like(img) * .01 + img.mean()
met = po.synth.MetamerCTF(img, model, loss_function=po.tools.optim.l2_norm, initial_image=im_init,
coarse_to_fine='together')
met.synthesize(
max_iter=long_synth_max_iter,
store_progress=True,
change_scale_criterion=None,
ctf_iters_to_check=3,
)
return met
3. The importance of different classes Texture Statistics
The Portilla-Simoncelli consists of a few different classes of statistics:
Marginal Statistics. These include pixel statistics (mean, variance, skew, kurtosis, and range of the pixel values), as well as the skewness and kurtosis of the lowpass images computed at each level of the recursive pyramid decomposition.
Auto-Correlation Statistics. These include the auto-correlation of the real-valued pyramid bands, as well as the auto-correlation of the magnitude of the pyramid bands, and the mean of the magnitude of the pyramid bands.
Cross-Correlation Statistics. These include correlations across scale and across orientation bands of the pyramid (both the for the real values of the pyramid and the magnitude of the pyramid bands).
The original paper uses synthesis to demonstrate the role of these different types of statistics. They show that the statistics can be used to constrain a synthesis optimization to generate new examples of textures. They also show that the absence of subsets of statistics results in synthesis failures. Here we replicate those results.
The first step is to create a version of the Portilla Simoncelli model where certain statistics can be turned off.
There are two important implementation details here, which you might be interested in if you’d like to write a similar extension of this model, and they both relate to coarse-to-fine synthesis. When removing statistics from the model, the most natural implementation would be to to remove them from the model’s representation, changing the shape of the returned tensor. However, in order for coarse-to-fine synthesis to work, we need to know which scale each statistic aligns with, and changing the
shape destroys that mapping. Therefore, the proper way to remove statistics (in order to remain compatible with coarse-to-fine optimization) is to zero out those statistics instead: directly setting them to zero breaks the gradient so that they have no impact on the synthesis procedure. The second detail is that, during coarse-to-fine optimization, we must remove some set of statistics, which we do by calling the remove_scales
method at the end of the function call. See the forward
call below for an example of this
[15]:
# The following class extends the PortillaSimoncelli model so that you can specify which
# statistics you would like to remove. We have created this model so that we can examine
# the consequences of the absence of specific statistics.
#
# Be sure to run this cell.
from collections import OrderedDict
class PortillaSimoncelliRemove(po.simul.PortillaSimoncelli):
r"""Model for measuring a subset of texture statistics reported by PortillaSimoncelli
Parameters
----------
im_shape: int
the size of the images being processed by the model
remove_keys: list
The dictionary keys for the statistics we will "remove". In practice we set them to zero.
Possible keys: ["pixel_statistics", "auto_correlation_magnitude",
"skew_reconstructed", "kurtosis_reconstructed", "auto_correlation_reconstructed",
"std_reconstructed", "magnitude_std", "cross_orientation_correlation_magnitude",
"cross_scale_correlation_magnitude" "cross_scale_correlation_real", "var_highpass_residual"]
"""
def __init__(
self,
im_shape,
remove_keys,
):
super().__init__(im_shape, n_scales=4, n_orientations=4, spatial_corr_width=9)
self.remove_keys = remove_keys
def forward(self, image, scales=None):
r"""Generate Texture Statistics representation of an image with `remove_keys` removed.
Parameters
----------
image : torch.Tensor
A tensor containing the image to analyze.
scales : list, optional
Which scales to include in the returned representation. If an empty
list (the default), we include all scales. Otherwise, can contain
subset of values present in this model's ``scales`` attribute.
Returns
-------
representation: torch.Tensor
3d tensor of shape (batch, channel, stats) containing the measured texture stats.
"""
# create the representation tensor with (with all scales)
stats_vec = super().forward(image)
# convert to dict so it's easy to zero out the keys we don't care about
stats_dict = self.convert_to_dict(stats_vec)
for kk in self.remove_keys:
# we zero out the stats (instead of removing them) because removing them
# makes it difficult to keep track of which stats belong to which scale
# (which is necessary for coarse-to-fine synthesis) -- see discussion above.
if isinstance(stats_dict[kk],OrderedDict):
for (key,val) in stats_dict[kk].items():
stats_dict[kk][key] *= 0
else:
stats_dict[kk] *= 0
# then convert back to tensor and remove any scales we don't want (for coarse-to-fine)
# -- see discussion above.
stats_vec = self.convert_to_tensor(stats_dict)
if scales is not None:
stats_vec = self.remove_scales(stats_vec, scales)
return stats_vec
Pixel Statistics + Marginal statistics
Beginning with some of the pixel and marginal statistics, we’ll demonstrate synthesis both with and without combinations of statistics.
The cell below replicates examples of synthesis failures with the following statistics removed:
the pixel statistics: mean, variance, skew, kurtosis, minimum, maximum and
marginal statistics on the lowpass images computed at each level of the recursive pyramid (skew, kurtosis)
These statistics play an important role constraining the histogram of pixel intensities to match across the original and synthesized image.
(see figure 3 of Portilla & Simoncelli 2000)
[16]:
# which statistics to remove
remove_statistics = ['pixel_statistics','skew_reconstructed','kurtosis_reconstructed']
# run on fig3a or fig3b to replicate paper
img = po.tools.load_images(DATA_PATH / 'fig3b.jpg').to(DEVICE)
# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)
# synthesis with pixel and marginal statistics absent
model_remove = PortillaSimoncelliRemove(img.shape[-2:] ,remove_keys=remove_statistics).to(DEVICE)
metamer_remove = run_synthesis(img, model_remove)
12%|█▏ | 374/3000 [00:16<01:52, 23.32it/s, loss=2.0057e-01, learning_rate=0.01, gradient_norm=6.1037e-01, pixel_change_norm=2.7060e-01, current_scale=all, current_scale_loss=2.0057e-01]
53%|█████▎ | 1577/3000 [01:11<01:04, 22.00it/s, loss=6.3014e-02, learning_rate=0.01, gradient_norm=8.9518e-01, pixel_change_norm=1.4009e-01, current_scale=all, current_scale_loss=6.3014e-02]
In the following figure, we can see that not only does the metamer created with all statistics look more like the target image than the one creaated without the marginal statistics, but its pixel intensity histogram is much more similar to that of the target image.
[17]:
# visualize results
fig = po.imshow([metamer.image, metamer.metamer, metamer_remove.metamer],
title=['Target image', 'Full Statistics', 'Without Marginal Statistics'], vrange='auto1');
# add plots showing the different pixel intensity histograms
fig.add_axes([.33, -1, .33, .9])
fig.add_axes([.67, -1, .33, .9])
# this helper function expects a metamer object. see the metamer notebook for details.
po.synth.metamer.plot_pixel_values(metamer, ax=fig.axes[3])
fig.axes[3].set_title('Full statistics')
po.synth.metamer.plot_pixel_values(metamer_remove, ax=fig.axes[4])
fig.axes[4].set_title('Without marginal statistics')
[17]:
Text(0.5, 1.0, 'Without marginal statistics')

Coefficient Correlations
The cell below replicates examples of synthesis failures with the following statistics removed:
local auto-correlations of the lowpass images computed at each level of the recursive pyramid
These statistics play a role in representing periodic structures and long-range correlations. For example, in the image named fig4b.jpg (the tile pattern) the absence of these statistics causes results in more difficulty synthesizing the long, continuous lines that stretch from one end of the image to the other.
(see figure 4 of Portilla & Simoncelli 2000)
[18]:
# which statistics to remove. note that, in the original paper, std_reconstructed is implicitly contained within
# auto_correlation_reconstructed, view the section on differences between plenoptic and matlab implementation
# for details
remove_statistics = ['auto_correlation_reconstructed', 'std_reconstructed']
# run on fig4a or fig4b to replicate paper
img = po.tools.load_images(DATA_PATH / 'fig4b.jpg').to(DEVICE)
# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)
# synthesis with coefficient correlations absent
model_remove = PortillaSimoncelliRemove(img.shape[-2:], remove_keys=remove_statistics).to(DEVICE)
metamer_remove = run_synthesis(img, model_remove)
100%|██████████| 3000/3000 [02:09<00:00, 23.22it/s, loss=1.0762e-01, learning_rate=0.01, gradient_norm=6.9003e-01, pixel_change_norm=1.5595e-01, current_scale=all, current_scale_loss=1.0762e-01]
100%|██████████| 3000/3000 [02:18<00:00, 21.69it/s, loss=9.2050e-01, learning_rate=0.01, gradient_norm=9.2850e-03, pixel_change_norm=1.7451e-02, current_scale=all, current_scale_loss=9.2050e-01]
[19]:
# visualize results
po.imshow([metamer.image, metamer.metamer, metamer_remove.metamer],
title=['Target image', 'Full Statistics', 'Without Correlation Statistics'], vrange='auto1');

And we can double check the error plots to see the difference in their representations. The first figure shows the error for the metamer created without the correlation statistics (at right above), while the second shows the error for the metamer created with all statistics (center), and we can see that larger error in the first plot in the middle row in the first figure, especially the center plot, auto_correlation_reconstructed
, since these statistics are unconstrained for the synthesis
done by metamer_remove
. (Note we have to use model
, not model_remove
to create these plots, since model_remove
always zeroes out those statistics.)
[20]:
fig, _ = model.plot_representation(model(metamer_remove.metamer) - model(metamer.image),
figsize=(15, 5), ylim=(-4, 4))
fig.suptitle('Without Correlation Statistics')
fig, _ = model.plot_representation(model(metamer.metamer) - model(metamer.image),
figsize=(15, 5), ylim=(-4, 4))
fig.suptitle('Full statistics');


Magnitude Correlation
The cell below replicates examples of synthesis failures with the following statistics removed:
correlation of the complex magnitude of pairs of coefficients at adjacent positions, orientations and scales.
These statistics play a role constraining high contrast locations to be organized along lines and edges across all scales. For example, in the image named fig6a.jpg the absence of these statistics results in a completely different organization of the orientation content in the edges.
(see figure 6 of Portilla & Simoncelli 2000)
[21]:
# which statistics to remove. note that, in the original paper, magnitude_std is implicitly contained within
# auto_correlation_magnitude, view the section on differences between plenoptic and matlab implementation
# for details
remove_statistics = ['magnitude_std', 'cross_orientation_correlation_magnitude',
'cross_scale_correlation_magnitude', 'auto_correlation_magnitude']
# run on fig6a or fig6b to replicate paper
img = po.tools.load_images(DATA_PATH / 'fig6a.jpg').to(DEVICE)
# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)
# synthesis with pixel and marginal statistics absent
model_remove = PortillaSimoncelliRemove(img.shape[-2:],remove_keys=remove_statistics).to(DEVICE)
metamer_remove = run_synthesis(img, model_remove)
17%|█▋ | 522/3000 [00:22<01:47, 22.97it/s, loss=9.1164e-02, learning_rate=0.01, gradient_norm=8.2437e-01, pixel_change_norm=1.5844e-01, current_scale=all, current_scale_loss=9.1164e-02]
16%|█▌ | 479/3000 [00:22<01:56, 21.60it/s, loss=7.1354e-02, learning_rate=0.01, gradient_norm=9.4536e-01, pixel_change_norm=1.4267e-01, current_scale=all, current_scale_loss=7.1354e-02]
[22]:
# visualize results
po.imshow([metamer.image, metamer.metamer, metamer_remove.metamer],
title=['Target image', 'Full Statistics','Without Magnitude Statistics'], vrange='auto1');

And again, let’s look at the error plots. The first figure shows the error for the metamer created without the correlation statistics (at right above), while the second shows the error for the metamer created with all statistics (center), and we can see that larger error in the plot scorresponding to auto_correlation_magnitude
, cross_orientation_correlation_magnitude
, and cross_scale_correlation_magnitude
., since these statistics are unconstrained for the synthesis done by
metamer_remove
. (Note we have to use model
, not model_remove
to create these plots, since model_remove
always zeroes out those statistics.)
[23]:
fig, _ = model.plot_representation(model(metamer_remove.metamer) - model(metamer.image),
figsize=(15, 5), ylim=(-2, 2))
fig.suptitle('Without Correlation Statistics')
fig, _ = model.plot_representation(model(metamer.metamer) - model(metamer.image),
figsize=(15, 5), ylim=(-2, 2))
fig.suptitle('Full statistics');


Cross-scale Phase Statistics
The cell below replicates examples of synthesis failures with the following statistics removed:
relative phase of coefficients of bands at adjacent scales
These statistics play a role constraining high contrast locations to be organized along lines and edges across all scales. These phase statistics are important in representing textures with strong illumination effects. When they are removed, the synthesized images appear much less three dimensional and lose the detailed structure of shadows.
(see figure 8 of Portilla & Simoncelli 2000)
[24]:
# which statistics to remove
remove_statistics = ['cross_scale_correlation_real']
# run on fig8a and fig8b to replicate paper
img = po.tools.load_images(DATA_PATH / 'fig8b.jpg').to(DEVICE)
# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)
# synthesis with pixel and marginal statistics absent
model_remove = PortillaSimoncelliRemove(img.shape[-2:], remove_keys=remove_statistics).to(DEVICE)
metamer_remove = run_synthesis(img, model_remove)
16%|█▌ | 482/3000 [00:20<01:48, 23.24it/s, loss=7.3351e-02, learning_rate=0.01, gradient_norm=8.6994e-01, pixel_change_norm=1.5538e-01, current_scale=all, current_scale_loss=7.3351e-02]
17%|█▋ | 512/3000 [00:23<01:53, 21.87it/s, loss=7.2080e-02, learning_rate=0.01, gradient_norm=8.8912e-01, pixel_change_norm=1.5535e-01, current_scale=all, current_scale_loss=7.2080e-02]
[25]:
# visualize results
po.imshow([metamer.image, metamer.metamer, metamer_remove.metamer],
title=['Target image', 'Full Statistics','Without Cross-Scale Phase Statistics'], vrange='auto1');

And again, let’s look at the error plots. The first figure shows the error for the metamer created without the correlation statistics (at right above), while the second shows the error for the metamer created with all statistics (center), and we can see that larger error in the final plot in the first figure, cross_scale_correlation_real
, since these statistics are unconstrained for the synthesis done by metamer_remove
. (Note we have to use model
, not model_remove
to create these
plots, since model_remove
always zeroes out those statistics.)
[26]:
fig, _ = model.plot_representation(model(metamer_remove.metamer) - model(metamer.image),
figsize=(15, 5), ylim=(-1.2, 1.2))
fig.suptitle('Without Correlation Statistics')
fig, _ = model.plot_representation(model(metamer.metamer) - model(metamer.image),
figsize=(15, 5), ylim=(-1.2, 1.2))
fig.suptitle('Full statistics');


4. Examples from different texture classes
Hand-drawn / computer-generated textures
(see figure 12 of Portilla Simoncelli 2000)
The following cell can be used to reproduce texture synthesis on the hand-drawn / computer-generated texture examples in the original paper, showing that the model can handle these simpler images as well.
Examples
(12a) solid black squares
(12b) tilted gray columns
(12c) curvy lines
(12d) dashes
(12e) solid black circles
(12f) pluses
[27]:
img = po.tools.load_images(DATA_PATH / 'fig12a.jpg').to(DEVICE)
# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img,model)
100%|██████████| 3000/3000 [02:09<00:00, 23.15it/s, loss=2.9268e+00, learning_rate=0.01, gradient_norm=4.8896e-01, pixel_change_norm=1.2103e-01, current_scale=all, current_scale_loss=2.9268e+00]
[28]:
po.imshow([metamer.image, metamer.metamer],
title=['Target image', 'Synthesized Metamer'], vrange='auto1');

Counterexample to the Julesz Conjecture
The Julesz conjecture, originally from Julesz 1962, states that “humans cannot distinguish between textures with identical second-order statistics” (second-order statistics include cross- and auto-correlations, see paper for details). Following up on this initial paper, Julesz et al, 1978 and then Yellot, 1993 created images that served as counter-examples for this conjecture: pairs of images that had identical second-order statistics (they differed in their third- and higher-order statistics) but were readily distinguishable by humans. In figure 13 of Portilla & Simoncelli, 2000, the authors show that the model is able to synthesize novel images based on these counterexamples that are also distinguishbale by humans, so the model does not confuse them either.
(see figure 13 of Portilla & Simoncelli 2000)
Excerpt from paper: “Figure 13 shows two pairs of counterexamples that have been used to refute the Julesz conjecture. [13a and 13b were ] originally created by Julesz et al. (1978): they have identical third-order pixel statistics, but are easily discriminated by human observers. Our model succeeds, in that it can reproduce the visual appearance of either of these textures. In particular, we have seen that the strongest statistical difference arises in the magnitude correlation statistcs. The rightmost pair were constructed by Yellott (1993), to have identical sample autocorrelation. Again, our model does not confuse these, and can reproduce the visual appearance of either one.”
[29]:
# Run on fig13a, fig13b, fig13c, fig13d to replicate examples in paper
img = po.tools.load_images(DATA_PATH / 'fig13a.jpg').to(DEVICE)
# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer_left = run_synthesis(img,model)
100%|██████████| 3000/3000 [02:10<00:00, 23.02it/s, loss=3.9404e-01, learning_rate=0.01, gradient_norm=2.6782e-02, pixel_change_norm=4.4524e-02, current_scale=all, current_scale_loss=3.9404e-01]
[30]:
# Run on fig13a, fig13b, fig13c, fig13d to replicate examples in paper
img = po.tools.load_images(DATA_PATH / 'fig13b.jpg').to(DEVICE)
# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer_right = run_synthesis(img,model)
62%|██████▏ | 1860/3000 [01:20<00:49, 23.07it/s, loss=3.2113e-01, learning_rate=0.01, gradient_norm=1.8246e-01, pixel_change_norm=1.2679e-01, current_scale=all, current_scale_loss=3.2113e-01]
And note that the two synthesized images (right column) or as distinguishable from each other as the two hand-crafted counterexamples (left column):
[31]:
po.imshow([metamer_left.image, metamer_left.metamer,
metamer_right.image, metamer_right.metamer],
title=['Target image 1', 'Synthesized Metamer 1', 'Target Image 2', 'Synthesized Metamer 2'],
vrange='auto1', col_wrap=2);

Pseudo-periodic Textures
(see figure 14 of Portilla & Simoncelli 2000)
Excerpt from paper: “Figure 14 shows synthesis results photographic textures that are pseudo-periodic, such as a brick wall and various types of woven fabric”
[32]:
# Run on fig14a, fig14b, fig14c, fig14d, fig14e, fig14f to replicate examples in paper
img = po.tools.load_images(DATA_PATH / 'fig14a.jpg').to(DEVICE)
# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img,model)
18%|█▊ | 550/3000 [00:23<01:45, 23.13it/s, loss=2.3135e-01, learning_rate=0.01, gradient_norm=5.0994e-01, pixel_change_norm=2.7653e-01, current_scale=all, current_scale_loss=2.3135e-01]
[33]:
po.imshow([metamer.image, metamer.metamer],
title=['Target image', 'Synthesized Metamer'], vrange='auto1');

Aperiodic Textures
(see figure 15 of Portilla & Simoncelli 2000)
Excerpt from paper: “Figure 15 shows synthesis results for a set of photographic textures that are aperiodic, such as the animal fur or wood grain”
[34]:
# Run on fig15a, fig15b, fig15c, fig15d to replicate examples in paper
img = po.tools.load_images(DATA_PATH / 'fig15a.jpg').to(DEVICE)
# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img,model)
14%|█▍ | 425/3000 [00:18<01:51, 23.09it/s, loss=9.6799e-02, learning_rate=0.01, gradient_norm=8.5685e-01, pixel_change_norm=1.7662e-01, current_scale=all, current_scale_loss=9.6799e-02]
[35]:
po.imshow([metamer.image, metamer.metamer],
title=['Target image', 'Synthesized Metamer'], vrange='auto1');

Complex Structured Photographic Textures
(see figure 16 of Portilla & Simoncelli 2000)
Excerpt from paper: “Figure 16 shows several examples of textures with complex structures. Although the synthesis quality is not as good as in previous examples, we find the ability of our model to capture salient visual features of these textures quite remarkable. Especially notable are those examples in all three figures for which shading produces a strong impression of three-dimensionality.”
[36]:
# Run on fig16a, fig16b, fig16c, fig16d to replicate examples in paper
img = po.tools.load_images(DATA_PATH / 'fig16e.jpg').to(DEVICE)
# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)
14%|█▎ | 412/3000 [00:17<01:52, 22.97it/s, loss=7.4121e-02, learning_rate=0.01, gradient_norm=1.2208e+00, pixel_change_norm=1.4139e-01, current_scale=all, current_scale_loss=7.4121e-02]
[37]:
po.imshow([metamer.image, metamer.metamer],
title=['Target image', 'Synthesized metamer'], vrange='auto1');

5. Extrapolation
(see figure 19 of Portilla & Simoncelli 2000)
Here we explore using the texture synthesis model for extrapolating beyond its spatial boundaries.
Excerpt from paper: “…[C]onsider the problem of extending a texture image beyond its spatial boundaries (spatial extrapolation). We want to synthesize an image in which the central pixels contain a copy of the original image, and the surrounding pixels are synthesized based on the statistical measurements of the original image. The set of all images with the same central subset of pixels is convex, and the projection onto such a convex set is easily inserted into the iterative loop of the synthesis algorithm. Specifically, we need only re-set the central pixels to the desired values on each iteration of the synthesis loop. In practice, this substitution is done by multiplying the desired pixels by a smooth mask (a raised cosine) and adding this to the current synthesized image multiplied by the complement of this mask. The smooth mask prevents artifacts at the boundary between original and synthesized pixels, whereas convergence to the desired pixels within the mask support region is achieved almost perfectly. This technique is applicable to the restoration of pictures which have been destroyed in some subregion (“filling holes”) (e.g., Hirani and Totsuka, 1996), although the estimation of parameters from the defective image is not straightforward. Figure 19 shows a set of examples that have been spatially extrapolated using this method. Observe that the border between real and synthetic data is barely noticeable. An additional potential benefit is that the synthetic images are seamlessly periodic (due to circular boundary-handling within our algorithm), and thus may be used to tile a larger image.”
In the following, we mask out the boundaries of an image and use the texture model to extend it.
[38]:
# The following class inherits from the PortillaSimoncelli model for
# the purpose of extrapolating (filling in) a chunk of an imaged defined
# by a mask.
class PortillaSimoncelliMask(po.simul.PortillaSimoncelli):
r"""Extend the PortillaSimoncelli model to operate on masked images.
Additional Parameters
----------
mask: Tensor
boolean mask with True in the part of the image that will be filled in during synthesis
target: Tensor
image target for synthesis
"""
def __init__(
self,
im_shape,
n_scales=4,
n_orientations=4,
spatial_corr_width=9,
mask=None,
target=None
):
super().__init__(im_shape, n_scales=4, n_orientations=4, spatial_corr_width=9)
self.mask = mask;
self.target = target;
def forward(self, image, scales=None):
r"""Generate Texture Statistics representation of an image using the target for the masked portion
Parameters
----------
images : torch.Tensor
A 4d tensor containing two images to analyze, with shape (2,
channel, height, width).
scales : list, optional
Which scales to include in the returned representation. If an empty
list (the default), we include all scales. Otherwise, can contain
subset of values present in this model's ``scales`` attribute.
Returns
-------
representation_tensor: torch.Tensor
3d tensor of shape (batch, channel, stats) containing the measured
texture statistics.
"""
if self.mask is not None and self.target is not None:
image = self.texture_masked_image(image)
return super().forward(image,scales=scales)
def texture_masked_image(self,image):
r""" Fill in part of the image (designated by the mask) with the saved target image
Parameters
------------
image : torch.Tensor
A tensor containing a single image
Returns
-------
texture_masked_image: torch.Tensor
An image that is a combination of the input image and the saved target.
Combination is specified by self.mask
"""
return self.target*self.mask + image*(~self.mask)
[39]:
img_file = DATA_PATH / 'fig14b.jpg'
img = po.tools.load_images(img_file).to(DEVICE)
im_init = (torch.rand_like(img)-.5) * .1 + img.mean();
mask = torch.zeros(1,1,256,256).bool().to(DEVICE)
ctr_dim = (img.shape[-2]//4, img.shape[-1]//4)
mask[...,ctr_dim[0]:3*ctr_dim[0],ctr_dim[1]:3*ctr_dim[1]] = True
model = PortillaSimoncelliMask(img.shape[-2:], target=img, mask=mask).to(DEVICE)
met = po.synth.MetamerCTF(img, model, loss_function=po.tools.optim.l2_norm, initial_image=im_init,
coarse_to_fine='together')
optimizer = torch.optim.Adam([met.metamer],lr=.02, amsgrad=True)
met.synthesize(
optimizer=optimizer,
max_iter=short_synth_max_iter,
store_progress=True,
change_scale_criterion=None,
ctf_iters_to_check=3
)
83%|████████▎ | 830/1000 [00:35<00:07, 23.10it/s, loss=1.5536e-01, learning_rate=0.02, gradient_norm=1.0073e+00, pixel_change_norm=3.0407e-01, current_scale=all, current_scale_loss=1.5536e-01]
[40]:
po.imshow([met.image, mask*met.image, model.texture_masked_image(met.metamer)], vrange='auto1',
title=['Full target image', 'Masked target' ,'synthesized image']);

5.2 Mixtures
Here we explore creating a texture that is “in between” two textures by averaging their texture statistics and synthesizing an image that matches those average statistics.
Note that we do this differently than what is described in the paper. In the original paper, mixed statistics were computed by calculating the statistics on a single input image that consisted of half of each of two texture images pasted together. This led to an “oil and water” appearance in the resulting texture metamer, which appeared to have patches from each image.
In the following, we compute the texture statistics on two texture images separately and then average the resulting statistics, which appears to perform better. Note that, in all the other examples in this notebook, we knew there exists at least one image whose output matches our optimization target: the image we started with. For these mixtures, that is no longer the case.
[41]:
# The following classes are designed to extend the PortillaSimoncelli model
# and the Metamer synthesis method for the purpose of mixing two target textures.
class PortillaSimoncelliMixture(po.simul.PortillaSimoncelli):
r"""Extend the PortillaSimoncelli model to mix two different images
Parameters
----------
im_shape: int
the size of the images being processed by the model
"""
def __init__(
self,
im_shape,
):
super().__init__(im_shape, n_scales=4, n_orientations=4, spatial_corr_width=9)
def forward(self, images, scales=None):
r"""Average Texture Statistics representations of two image
Parameters
----------
images : torch.Tensor
A 4d tensor containing one or two images to analyze, with shape (i,
channel, height, width), i in {1,2}.
scales : list, optional
Which scales to include in the returned representation. If an empty
list (the default), we include all scales. Otherwise, can contain
subset of values present in this model's ``scales`` attribute.
Returns
-------
representation_tensor: torch.Tensor
3d tensor of shape (batch, channel, stats) containing the measured
texture statistics.
"""
if images.shape[0] == 2:
# need the images to be 4d, so we use the "1 element slice"
stats0 = super().forward(images[:1], scales=scales)
stats1 = super().forward(images[1:2], scales=scales)
return (stats0+stats1)/2
else:
return super().forward(images, scales=scales)
class MetamerMixture(po.synth.MetamerCTF):
r""" Extending metamer synthesis based on image-computable
differentiable models, for mixing two images.
"""
def _initialize(self, initial_image):
"""Initialize the metamer.
Set the ``self.metamer`` attribute to be a parameter with
the user-supplied data, making sure it's the right shape.
Parameters
----------
initial_image :
The tensor we use to initialize the metamer. If None (the
default), we initialize with uniformly-distributed random
noise lying between 0 and 1.
"""
if initial_image.ndimension() < 4:
raise Exception("initial_image must be torch.Size([n_batch"
", n_channels, im_height, im_width]) but got "
f"{initial_image.size()}")
# the difference between this and the regular version of Metamer is that
# the regular version requires synthesized_signal and target_signal to have
# the same shape, and here target_signal is (2, 1, 256, 256), not (1, 1, 256, 256)
metamer = initial_image.clone().detach()
metamer = metamer.to(dtype=self.image.dtype,
device=self.image.device)
metamer.requires_grad_()
self._metamer = metamer
[42]:
# Figure 20. Examples of “mixture” textures.
# To replicate paper use the following combinations:
# (Fig. 15a, Fig. 15b); (Fig. 14b, Fig. 4a); (Fig. 15e, Fig. 14e).
img_files = [DATA_PATH / 'fig15e.jpg', DATA_PATH / 'fig14e.jpg']
imgs = po.tools.load_images(img_files).to(DEVICE)
im_init = torch.rand_like(imgs[0,:,:,:].unsqueeze(0)) * .01 + imgs.mean()
n=imgs.shape[-1]
model = PortillaSimoncelliMixture([n,n]).to(DEVICE)
met = MetamerMixture(imgs, model, loss_function=po.tools.optim.l2_norm, initial_image=im_init,
coarse_to_fine='together')
optimizer = torch.optim.Adam([met.metamer],lr=.02, amsgrad=True)
met.synthesize(
optimizer=optimizer,
max_iter=longest_synth_max_iter,
store_progress=True,
change_scale_criterion=None,
ctf_iters_to_check=3
)
21%|██ | 829/4000 [00:35<02:17, 23.05it/s, loss=3.0252e-01, learning_rate=0.02, gradient_norm=4.1979e-01, pixel_change_norm=2.6349e-01, current_scale=all, current_scale_loss=3.0252e-01]
[43]:
po.imshow([met.image, met.metamer], vrange='auto1',title=['Target image 1', 'Target image 2', 'Synthesized Mixture Metamer']);

6. Model Limitations
Not all texture model metamers look perceptually similar to humans. The paper’s figures 17 and 18 present two classes of failures: “inhomogeneous texture images not usually considered to be ‘texture’” (such as human faces, fig. 17) and some simple hand-drawn textures (fig. 18), many of which are simple geometric line drawings.
Note that for these examples, we were unable to locate the original images, so we present examples that serve the same purpose.
[44]:
img = po.data.einstein().to(DEVICE)
# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)
8%|▊ | 249/3000 [00:10<02:01, 22.67it/s, loss=1.0463e-01, learning_rate=0.01, gradient_norm=7.2169e-01, pixel_change_norm=1.5591e-01, current_scale=all, current_scale_loss=1.0463e-01]
Here we can see that the texture model fails to capture anything that makes this image look “portrait-like”: there is no recognizable face or clothes in the synthesized metamer. As a portrait is generally not considered a texture, this is not a model failure per se, but does demonstrate the limits of this model.
[45]:
po.imshow([metamer.image, metamer.metamer],
title=['Target image', 'Synthesized Metamer'], vrange='auto1');

In this example, we see the model metamer fails to reproduce the randomly distributed oriented black lines on a white background: in particular, several lines are curved and several appear discontinuous. From the paper: “Althought a texture of single-orientation bars is reproduced fairly well (see Fig. 12), the mixture of bar orientations in this example leads ot the synthesis of curved line segments. In general, the model is unable to distinguish straight from curved contours, except when the contours are all of the same orientation.”
[46]:
img = po.tools.load_images(DATA_PATH / 'fig18a.png').to(DEVICE)
# synthesis with full PortillaSimoncelli model
model = po.simul.PortillaSimoncelli(img.shape[-2:]).to(DEVICE)
metamer = run_synthesis(img, model)
46%|████▌ | 1366/3000 [00:59<01:10, 23.05it/s, loss=2.0882e-01, learning_rate=0.01, gradient_norm=2.2590e-01, pixel_change_norm=8.9952e-02, current_scale=all, current_scale_loss=2.0882e-01]
[47]:
po.imshow([metamer.image, metamer.metamer],
title=['Target image', 'Synthesized Metamer'], vrange='auto1');

7. Notable differences between Matlab and Plenoptic Implementations:
Optimization. The matlab implementation of texture synthesis is designed specifically for the texture model. Gradient descent is performed on subsets of the texture statistics in a particular sequence (coarse-to-fine, etc.). The plenoptic implementation relies on the auto-differentiation and optimization tools available in pytorch. We only define the forward model and then allow pytorch to handle the optimization.
Why does this matter? We have qualitatively reproduced the results but cannot guarantee exact reproducibility. This is true in general for the plenoptic package: https://plenoptic.readthedocs.io/en/latest/reproducibility.html. This means that, in general, metamers synthesized by the two versions will differ.
Lack of redundant statistics. As described in the next section, we output a different number of statistics than the Matlab implementation. The number of statistics returned in
plenoptic
matches the number of statistics reported in the paper, unlike the Matlab implementation. That is because the Matlab implementation included many redundant statistics, which were either exactly redundant (e.g., symmetric values in an auto-correlation matrix), placeholders (e.g., some 0s to make the shapes of the output work out), or not mentioned in the paper. The implementation included inplenoptic
returns only the necessary statistics. See the next section for more details.True correlations. In the Matlab implementation of Portilla Simoncelli statistics, the auto-correlation, cross-scale and cross-orientation statistics are based on co-variance matrices. When using
torch
to perform optimization, this makes convergence more difficult. We thus normalize each of these matrices, dividing the auto-correlation matrices by their center values (the variance) and the cross-correlation matrices by the square root of the product of the appropriate variances (so that we match numpy.corrcoef). This means that the center of the auto-correlations and the diagonals ofcross_orientation_correlation_magnitude
are always 1 and are thus excluded from the representation, as discussed above. We have thus added two new statistics,std_reconstructed
andmagnitude_std
(the standard deviation of the reconstructed lowpass images and the standard deviation of the magnitudes of each steerable pyramid band), to compensate (see Note at end of cell). Note that the cross-scale correlations have no redundancies and do not have 1 along the diagonal. For thecross_orientation_correlation_magnitude
, the value at \(A_{i,j}\) is the correlation between the magnitudes at orientation \(i\) and orientation \(j\) at the same scale, so that \(A_{i,i}\) is the correlation of a magnitude band with itself, i.e., \(1\). However, forcross_scale_correlation_magnitude
, the value at \(A_{i,j}\) is the correlation between the magnitudes at orientation \(i\) and orientation \(j\) at two adjacent scales, and thus \(A_{i,i}\) is not the correlation of a band with itself; it is thus informative.
Note: We use standard deviations, instead of variances, because the value of the standard deviations lie within approximately the same range as the other values in the model’s representation, which makes optimization work better.
7.1 Redundant statistics
The original Portilla-Simoncelli paper presents formulas to obtain the number of statistics in each class from the model parameters n_scales
, n_orientations
and spatial_corr_width
(labeled in the original paper \(N\), \(K\), and \(M\) respectively). The formulas indicate the following statistics for each class:
Marginal statistics: \(2(N+1)\) skewness and kurtosis of lowpass images, \(1\) high-pass variance, \(6\) pixel statistics.
Raw coefficient correlation: \((N+1)\frac{M^2+1}{2}\) statistics (\(\frac{M^2+1}{2}\) auto-correlations for each scale including lowpass)
Coefficient magnitude statistics: \(NK\frac{M^2+1}{2}\) autocorrelation statistics, \(N\frac{K(K-1)}{2}\) cross-orientation correlations at same scale, \(K^2(N-1)\) cross-scale correlations.
Cross-scale phase statistics: \(2K^2(N-1)\) statistics
In particular, the paper reads “For our texture examples, we have made choices of N = 4, K = 4 and M = 7, resulting in a total of 710 parameters”. However, the output of the Portilla-Simoncelli code in Matlab contains 1784 elements for these values of \(N\), \(K\) and \(M\). The discrepancy is because the Matlab output includes redundant statistics, placeholder values, and statistics not used during synthesis. The
plenoptic
output on the other hand returns only the essential statistics, and its output is in agreement with the papers formulas.
The redundant statistics that are removed by the plenoptic
package but that are present in the Matlab code are as follows:
Auto-correlation reconstructed: An auto-covariance matrix \(A\) encodes the covariance of the elements in a signal and their neighbors. Indexing the central auto-covariance element as \(A_{0,0}\), element \(A_{i,j}\) contains the covariance of the signal with it’s neighbor at a displacement \(i,j\). Because auto-correlation matrices are even functions, they have a symmetry where \(A_{i,j}=A_{-i,-j}\) which means that every element except the central one (\(A_{0,0}\), the variance) is duplicated (see Note at end of cell). Thus, in an autocorrelation matrix of size \(M \times M\), there are \(\frac{M^2+1}{2}\) non-redundant elements (see this ratio appear in the auto-correlation statistics formulas above). The Matlab code returns the full auto-covariance matrices, that is, \(M^2\) instead of \(\frac{M^2+1}{2}\) elements for each covariance matrix.
Auto-correlation magnitude: Same symmetry and redundancies as 1).
Cross-orientation magnitude correlation: Covariance matrices \(C\) (size \(K \times K\)) have symmetry \(C_{i,j} = C_{j,i}\) (each off-diagonal element is duplicated, i.e., they’re symmetric). Thus, a \(K \times K\) covariance matrix has \(\frac{K(K+1)}{2}\) non-redundant elements. However, the diagonal elements of the cross-orientation correlations are variances, which are already contained in the central elements of the auto-correlation magnitude matrices. Thus, these covariances only hold \(\frac{K(K-1)}{2}\) non-redundant elements (see this term in the formulas above). The Matlab code returns the full covariances (with \(K^2\) elements) instead of the non-redundant ones. Also, the Matlab code returns an extra covariance matrix full of 0’s not mentioned in the paper (\((N+1)\) matrices instead of \((N)\)).
Cross-scale real correlation (phase statistics): Phase statistics contain the correlations between the \(K\) real orientations at a scale with the \(2K\) real and imaginary phase-doubled orientations at the following scale, making a total of \(K \times 2K=2K^2\) statistics (see this term in the formulas above). However, the Matlab output has matrices of size \(2K \times 2K\), where half of the matrices are filled with 0’s. Also, the paper counts the \((N-1)\) pairs of adjacent scales, but the Matlab output includes \(N\) matrices. The
plenoptic
output removes the 0’s and the extra matrix.Statistics not in paper: The Matlab code outputs the mean magnitude of each band and cross-orientation real correlations, but these are not enumerated in the paper. These statistics are removed in
plenoptic
. See the next section for some more detail about the magnitude means.
Note: This can be understood by thinking of \(A_{i,0}\), the autocorrelation of every pixel and the pixel \(i\) to their right. Computing this auto-covariance involves adding together all the products \(I_{x,y}*I_{x+i,y}\) for every x and y in the image. But this is equivalent to computing \(A_{-i,0}\), because every pair of two neighbors \(i\) to the right \(I_{x,y}*I_{x+i,y}\) is also a pair of neighbors \(i\) to the left, \(I_{x+i,y}*I_{(x+i)-i,y}=I_{x+i,y}*I_{x,y}\). So, any opposite displacements around the central element in the auto-covariance matrix will have the same value.
As shown below, the output of plenoptic
matches the number of statistics indicated in the paper:
[48]:
img = po.tools.load_images(DATA_PATH / 'fig4a.jpg')
image_shape = img.shape[2:4]
# Initialize the minimal model. Use same params as paper
model = po.simul.PortillaSimoncelli(image_shape, n_scales=4,
n_orientations=4,
spatial_corr_width=7)
stats = model(img)
print(f'Stats for N=4, K=4, M=7: {stats[0].shape[1]} statistics')
Stats for N=4, K=4, M=7: 710 statistics
plenoptic
allows to convert the tensor of statistics into a dictionary containing matrices, similar to the Matlab output. In this dictionary, the redundant statistics are indicated with NaN
s. We print one of the auto-correlation matrices showing the redundant elements it contains:
[49]:
stats_dict = model.convert_to_dict(stats)
s = 1
o = 2
print(stats_dict['auto_correlation_magnitude'][0,0,:,:,s,o])
tensor([[0.1396, nan, nan, nan, nan, nan, nan],
[0.2411, 0.3492, nan, nan, nan, nan, nan],
[0.3750, 0.5434, 0.7396, nan, nan, nan, nan],
[0.4501, 0.6598, 0.8886, nan, nan, nan, nan],
[0.3909, 0.5783, 0.7708, 0.8490, nan, nan, nan],
[0.2488, 0.3786, 0.5111, 0.5619, 0.4833, nan, nan],
[0.1404, 0.2305, 0.3287, 0.3715, 0.3175, 0.2189, nan]])
We see in the output above that both the upper triangular part of the matrix, and the diagonal elements from the center onwards are redundant, as indicated in the text above. Note that although the central element is not redundant in auto-covariance matrices, when the covariances are converted to correlations, the central element is 1, and so uninformative (see previous section for more information).
We can count how many statistics are in this particular class:
[50]:
acm_not_redundant = torch.sum(~torch.isnan(stats_dict['auto_correlation_magnitude']))
print(f'Non-redundant elements in acm: {acm_not_redundant}')
Non-redundant elements in acm: 384
The number of non redundant elements is 16 elements short of the \(NK\frac{M^2+1}{2} = 4\cdot 4 \cdot \frac{7^2+1}{2}=400\) statistics indicated by the formula. This is because plenoptic
removes the central elements of these matrices and holds them in stats_dict['magnitude_std']
:
[51]:
print(f"Number magnitude band variances: {stats_dict['magnitude_std'].numel()}")
Number magnitude band variances: 16
Next, lets see whether the number of statistics in each class match what is in the original paper:
Marginal statistics: Total of
17
statisticskurtosis + skewness:
2*(N+1) = 2*(4+1) = 10
variance of high pass band:
1
pixel statistics:
6
Raw coefficient correlation: Total of
125
statisticsCentral samples of auto-correlation reconstructed:
(N+1)*(M^2+1)/2 = (4+1)*(7^2+1)/2 = 125
Coefficient magnitude statistics: Total of
472
statisticsCentral samples of auto-correlation of magnitude of each subband
N*K*(M^2+1)/2 = 4*4*(7^2+1)/2 = 400
Cross-correlation of orientations in same scale:
N*K*(K-1)/2 = 4*4*(4-1)/2 = 24
Cross-correlation of magnitudes across scale:
K^2*(N-1) = 4^2*(4-1) = 48
Cross-scale phase statistics: Total
96
statisticsCross-correlation of real coeffs with both coeffs at broader scale:
2*K^2*(N-1) = 2*4^2*(4-1) = 96
[52]:
# Sum marginal statistics
marginal_stats_num = (torch.sum(~torch.isnan(stats_dict['kurtosis_reconstructed'])) +
torch.sum(~torch.isnan(stats_dict['skew_reconstructed'])) +
torch.sum(~torch.isnan(stats_dict['var_highpass_residual'])) +
torch.sum(~torch.isnan(stats_dict['pixel_statistics'])))
print(f'Marginal statistics: {marginal_stats_num} parameters, compared to 17 in paper')
# Sum raw coefficient correlations
real_coefficient_corr_num = torch.sum(~torch.isnan(stats_dict['auto_correlation_reconstructed']))
real_variances = torch.sum(~torch.isnan(stats_dict['std_reconstructed']))
print(f'Raw coefficient correlation: {real_coefficient_corr_num + real_variances} parameters, '
'compared to 125 in paper')
# Sum coefficient magnitude statistics
coeff_magnitude_stats_num = (torch.sum(~torch.isnan(stats_dict['auto_correlation_magnitude'])) +
torch.sum(~torch.isnan(stats_dict['cross_scale_correlation_magnitude'])) +
torch.sum(~torch.isnan(stats_dict['cross_orientation_correlation_magnitude'])))
coeff_magnitude_variances = torch.sum(~torch.isnan(stats_dict['magnitude_std']))
print(f'Coefficient magnitude statistics: {coeff_magnitude_stats_num + coeff_magnitude_variances} '
'parameters, compared to 472 in paper')
# Sum cross-scale phase statistics
phase_statistics_num = torch.sum(~torch.isnan(stats_dict['cross_scale_correlation_real']))
print(f'Phase statistics: {phase_statistics_num} parameters, compared to 96 in paper')
Marginal statistics: 17 parameters, compared to 17 in paper
Raw coefficient correlation: 125 parameters, compared to 125 in paper
Coefficient magnitude statistics: 472 parameters, compared to 472 in paper
Phase statistics: 96 parameters, compared to 96 in paper
7.2 Magnitude means
The mean of each magnitude band are slightly different from the redundant statistics discussed in the previous section. Each of those statistics are exactly redundant, e.g., the center value of an autocorrelation matrix will always be 1. They thus cannot include any additional information. However, the magnitude means are only approximately redundant and thus could improve the texture representation. The authors excluded these values because they did not seem to be necessary: the magnitude means are constrained by the other statistics (though not perfectly), and thus including them does not improve the visual quality of the synthesized textures.
To demonstrate this, we will create a modified version of the PortillaSimoncelli
class which includes the magnitude means to demonstrate:
Even without explicitly including them in the texture representation, they are still approximately matched between the original and synthesized texture images.
Including them in the representation does not significantly change the quality of the synthesized texture.
First, let’s create the modified model:
[53]:
from collections import OrderedDict
class PortillaSimoncelliMagMeans(po.simul.PortillaSimoncelli):
r"""Include the magnitude means in the PS texture representation.
Parameters
----------
im_shape: int
the size of the images being processed by the model
"""
def __init__(
self,
im_shape,
):
super().__init__(im_shape, n_scales=4, n_orientations=4, spatial_corr_width=7)
def forward(self, image, scales=None):
r"""Average Texture Statistics representations of two image
Parameters
----------
image : torch.Tensor
A 4d tensor (batch, channel, height, width) containing the image(s) to
analyze.
scales : list, optional
Which scales to include in the returned representation. If an empty
list (the default), we include all scales. Otherwise, can contain
subset of values present in this model's ``scales`` attribute.
Returns
-------
representation_tensor: torch.Tensor
3d tensor of shape (batch, channel, stats) containing the measured
texture statistics.
"""
stats = super().forward(image, scales=scales)
# this helper function returns a list of tensors containing the steerable
# pyramid coefficients at each scale
pyr_coeffs = self._compute_pyr_coeffs(image)[1]
# only compute the magnitudes for the desired scales
magnitude_pyr_coeffs = [coeff.abs() for i, coeff in enumerate(pyr_coeffs)
if scales is None or i in scales]
magnitude_means = [mag.mean((-2, -1)) for mag in magnitude_pyr_coeffs]
return einops.pack([stats, *magnitude_means], 'b c *')[0]
# overwriting these following two methods allows us to use the plot_representation method
# with the modified model, making examining it easier.
def convert_to_dict(self, representation_tensor: torch.Tensor) -> OrderedDict:
"""Convert tensor of stats to dictionary."""
n_mag_means = self.n_scales * self.n_orientations
rep = super().convert_to_dict(representation_tensor[..., :-n_mag_means])
mag_means = representation_tensor[..., -n_mag_means:]
rep['magnitude_means'] = einops.rearrange(mag_means, 'b c (s o) -> b c s o', s=self.n_scales, o=self.n_orientations)
return rep
def _representation_for_plotting(self, rep: OrderedDict) -> OrderedDict:
r"""Convert the data into a dictionary representation that is more convenient for plotting.
Intended as a helper function for plot_representation.
"""
mag_means = rep.pop('magnitude_means')
data = super()._representation_for_plotting(rep)
data['magnitude_means'] = mag_means.flatten()
return data
Now, let’s initialize our models and images for synthesis:
[55]:
img = po.tools.load_images(DATA_PATH / 'fig4a.jpg').to(DEVICE)
model = po.simul.PortillaSimoncelli(img.shape[-2:], spatial_corr_width=7).to(DEVICE)
model_mag_means = PortillaSimoncelliMagMeans(img.shape[-2:]).to(DEVICE)
im_init = (torch.rand_like(img)-.5) * .1 + img.mean()
And run the synthesis with the regular model, which does not include the mean of the steerable pyramid magnitudes, and then the augmented model, which does.
[56]:
# Set the RNG seed to make the two synthesis procedures as similar as possible.
po.tools.set_seed(100)
met = po.synth.MetamerCTF(img, model, loss_function=po.tools.optim.l2_norm, initial_image=im_init)
met.synthesize(store_progress=10, max_iter=short_synth_max_iter, change_scale_criterion=None, ctf_iters_to_check=7)
po.tools.set_seed(100)
met_mag_means = po.synth.MetamerCTF(img, model_mag_means, loss_function=po.tools.optim.l2_norm, initial_image=im_init)
met_mag_means.synthesize(store_progress=10, max_iter=short_synth_max_iter, change_scale_criterion=None, ctf_iters_to_check=7)
93%|█████████▎| 927/1000 [00:40<00:03, 22.91it/s, loss=7.3206e-02, learning_rate=0.01, gradient_norm=8.8649e-01, pixel_change_norm=1.4852e-01, current_scale=all, current_scale_loss=7.3206e-02]
93%|█████████▎| 934/1000 [00:45<00:03, 20.61it/s, loss=7.5847e-02, learning_rate=0.01, gradient_norm=8.6494e-01, pixel_change_norm=1.5210e-01, current_scale=all, current_scale_loss=7.5847e-02]
Now let’s examine the outputs. In the following plot, we display the synthesized metamer and the representation error for the metamer synthesized with and without explicitly constraining the magnitude means.
The two synthesized metamers appear almost identical, so including the magnitude means does not substantially change the resulting metamer at all, let alone improve its visual quality.
The representation errors are (as we’d expect) also very similar. Let’s focus on the plot in the bottom right, labeled “magnitude_means”. Each stem shows the mean of one of the magnitude bands, with the scales increasing from left to right. Looking at the representation error for the first image, we can see that, even without explicitly including the means, the error in this statistic is on the same magnitude as the other statistics, showing that it is being implicitly constrained. By comparing to the error for the second image, we can see that the error in the magnitude means does decrease, most notably in the coarsest scales.
[57]:
fig, axes = plt.subplots(2, 2, figsize=(21, 11), gridspec_kw={'width_ratios': [1, 3.1]})
for ax, im, info in zip(axes[:, 0], [met.metamer, met_mag_means.metamer], ['with', 'without']):
po.imshow(im, ax=ax, title=f"Metamer {info} magnitude means")
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
model_mag_means.plot_representation(model_mag_means(met.metamer)-model_mag_means(img), ylim=(-.06, .06), ax=axes[0,1]);
model_mag_means.plot_representation(model_mag_means(met_mag_means.metamer)-model_mag_means(img), ylim=(-.06, .06), ax=axes[1,1]);

Thus, we can feel fairly confident in excluding these magnitude means from the model. Note this follows the same logic as earlier in the notebook, when we tried removing different statistics to see their effect; here, we tried adding a statistic to determine its effect. Feel free to try using other target images or adding other statistics!
Under development – this currently contains examples of the earlier MAD synthesis, but we have yet to reproduce it using plenoptic
.
Reproducing Wang and Simoncelli, 2008 (MAD Competition)
Goal here is to reproduce original MAD Competition results, as generated using the matlab code originally provided by Zhou Wang and then modified by the authors. MAD Competition is a synthesis method for efficiently computing two models, by generating sets of images that minimize/maximize one model’s loss while holding the other’s constant. For more details, see the 07_MAD_Competition
and 08_Simple_MAD
notebooks.
[1]:
import imageio
import torch
import scipy.io as sio
import pyrtools as pt
from scipy.io import loadmat
import numpy as np
import matplotlib.pyplot as plt
import plenoptic as po
import os.path as op
%matplotlib inline
%load_ext autoreload
%autoreload 2
SSIM
Before we discuss MAD Competition, let’s look a little at SSIM, since that’s the metric used in the original paper, and which we’ll be using here. Important to remember that SSIM is a similarity metric, so higher is better, and thus a value of 1 means the images are identical and it’s bounded between 0 and 1.
We have tests to show that this matches the output of the MATLAB code, won’t show here.
[2]:
img1 = po.data.einstein()
img2 = po.data.curie()
noisy = po.tools.add_noise(img1, [2,4,8])
We can see that increasing the noise level decreases the SSIM value, but not linearly
[3]:
po.metric.ssim(img1, noisy)
/home/billbrod/Documents/plenoptic/src/plenoptic/metric/perceptual_distance.py:42: UserWarning: Image range falls outside [0, 1]. img1: tensor([0.0039, 1.0000]), img2: tensor([-12.3002, 11.9818]). Continuing anyway...
warnings.warn("Image range falls outside [0, 1]."
/home/billbrod/micromamba/envs/plenoptic/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
[3]:
tensor([[0.0026],
[0.0016],
[0.0004]])
And that our noise level does match the MSE
[4]:
po.metric.mse(img1, noisy)
[4]:
tensor([[2.0000],
[4.0000],
[8.0000]])
MAD Competition
The following figure shows the results of MAD Competition synthesis using the original MATLAB code. It shows the original image in the top left. We then added some Gaussian noise (with a specified standard error) to get the image right below it. The four images to the right of that are the MAD-synthesized images. The first two have the same mean-squared error (MSE) as the first image (and each other), but the best and worst SSIM value (SSIM is a similarity metric, so higher is better), while the second two have the same SSIM as the first image, but the best and worst MSE. By comparing these images, we can get a sense for what MSE and SSIM consider important for image quality.
[5]:
# We need to download some additional data for this portion of the notebook. In order to do so,
# we use an optional dependency, pooch. If the following raises an ImportError or ModuleNotFoundError
# then install pooch in your plenoptic environment and restart your kernel.
fig, results = po.tools.external.plot_MAD_results('samp6', [128], vrange='row1', zoom=3)

There’s lots of info here, on the outputs of the MATLAB synthesis. We will later add stuff to investigate this using plenoptic
.
[6]:
results
[6]:
{'L128': {'FIX_MSE': 127.99999999999999,
'FIX_SSIM': 0.8183184633106257,
'mse_fixmse_maxssim': array([128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128.]),
'ssim_fixmse_maxssim': array([0.82669306, 0.83641599, 0.84768936, 0.86021352, 0.87332037,
0.8861153 , 0.89794336, 0.90864194, 0.91828312, 0.9270046 ,
0.93495769, 0.94226293, 0.94896492, 0.95506452, 0.9605342 ,
0.96533558, 0.96945409, 0.97290935, 0.97575609, 0.97807185,
0.97994388, 0.98145704, 0.98268627, 0.98369369, 0.98452855,
0.98522884, 0.98582351, 0.98633451, 0.98677852, 0.98716831,
0.98751374, 0.98782249, 0.98810062, 0.98835294, 0.98858334,
0.98879493, 0.98899028, 0.98917148, 0.98934029, 0.98949816,
0.9896463 , 0.98978576, 0.98991742, 0.99004203, 0.99016023,
0.99027259, 0.99037961, 0.9904817 , 0.99057924, 0.99067257,
0.99076198, 0.99084774, 0.99093007, 0.9910092 , 0.99108531,
0.99115858, 0.99122917, 0.99129722, 0.99136287, 0.99142623,
0.99148744, 0.99154658, 0.99160377, 0.99165909, 0.99171263,
0.99176448, 0.9918147 , 0.99186338, 0.99191058, 0.99195637,
0.99200081, 0.99204396, 0.99208587, 0.99212661, 0.99216622,
0.99220476, 0.99224226, 0.99227877, 0.99231435, 0.99234902,
0.99238284, 0.99241583, 0.99244804, 0.99247949, 0.99251023,
0.99254029, 0.99256969, 0.99259847, 0.99262665, 0.99265426,
0.99268134, 0.99270789, 0.99273395, 0.99275955, 0.99278469,
0.99280941, 0.99283372, 0.99285765, 0.9928812 , 0.99290441]),
'mse_fixmse_minssim': array([128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128., 128., 128., 128., 128., 128., 128., 128., 128., 128., 128.,
128.]),
'ssim_fixmse_minssim': array([0.81069721, 0.80415136, 0.79827382, 0.7927989 , 0.78758476,
0.7825861 , 0.77782948, 0.77338629, 0.76933555, 0.7657286 ,
0.76258054, 0.75987991, 0.75759382, 0.75567185, 0.75405628,
0.7526936 , 0.75154034, 0.75056265, 0.74973342, 0.74902986,
0.74843227, 0.74792359, 0.7474892 , 0.74711674, 0.74679588,
0.74651809, 0.74627635, 0.74606493, 0.74587912, 0.74571508,
0.74556965, 0.74544023, 0.74532467, 0.74522116, 0.74512817,
0.74504441, 0.74496878, 0.74490032, 0.74483822, 0.74478177,
0.74473035, 0.74468342, 0.74464051, 0.74460123, 0.7445652 ,
0.74453212, 0.74450172, 0.74447375, 0.74444799, 0.74442426,
0.74440239, 0.74438221, 0.7443636 , 0.74434643, 0.74433057,
0.74431594, 0.74430244, 0.74428997, 0.74427846, 0.74426784,
0.74425803, 0.74424898, 0.74424062, 0.74423291, 0.74422579,
0.74421923, 0.74421317, 0.74420758, 0.74420242, 0.74419767,
0.74419328, 0.74418924, 0.74418551, 0.74418208, 0.74417891,
0.74417599, 0.74417331, 0.74417083, 0.74416855, 0.74416646,
0.74416453, 0.74416275, 0.74416112, 0.74415963, 0.74415825,
0.74415699, 0.74415583, 0.74415477, 0.7441538 , 0.74415291,
0.7441521 , 0.74415135, 0.74415068, 0.74415006, 0.7441495 ,
0.74414899, 0.74414853, 0.74414811, 0.74414773, 0.74414739]),
'maxssim': 0.99290440833461,
'minssim': 0.7441473913547447,
'mse_fixssim_minmse': array([127.62569966, 127.25907758, 126.89830629, 126.5432421 ,
126.19375253, 125.84970824, 125.51098323, 125.17745477,
124.84900331, 124.5255124 , 124.2068686 , 123.89296145,
123.58368333, 123.27892946, 122.97859775, 122.68258882,
122.39080584, 122.10315454, 121.81954311, 121.53988213,
121.26408454, 120.99206555, 120.72374258, 120.45903523,
120.19786522, 119.94015629, 119.68583422, 119.43482671,
119.18706338, 118.94247568, 118.70099686, 118.46256195,
118.22710767, 117.99457239, 117.76489611, 117.53802042,
117.31388843, 117.09244474, 116.87363542, 116.65740794,
116.44371116, 116.23249528, 116.02371179, 115.81731348,
115.61325436, 115.41148964, 115.21197571, 115.01467011,
114.81953145, 114.62651948, 114.43559495, 114.24671966,
114.0598564 , 113.87496892, 113.69202191, 113.510981 ,
113.33181268, 113.15448433, 112.97896417, 112.80522123,
112.63322535, 112.46294714, 112.29435797, 112.12742994,
111.96213588, 111.79844929, 111.63634436, 111.47579594,
111.31677951, 111.15927119, 111.00324769, 110.8486863 ,
110.6955649 , 110.54386192, 110.39355633, 110.24462762,
110.09705579, 109.95082136, 109.8059053 , 109.66228908,
109.5199546 , 109.37888422, 109.23906072, 109.10046731,
108.96308761, 108.82690563, 108.69190575, 108.55807275,
108.42539176, 108.29384826, 108.16342808, 108.03411739,
107.90590266, 107.77877069, 107.6527086 , 107.52770378,
107.40374392, 107.280817 , 107.15891125, 107.03801519]),
'ssim_fixssim_minmse': array([0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846,
0.81831846, 0.81831846, 0.81831846, 0.81831846, 0.81831846]),
'mse_fixssim_maxmse': array([ 131.81989257, 136.34348117, 141.63201819, 147.93891278,
155.38338115, 164.2072263 , 174.58969259, 180.77038681,
187.35687873, 194.48181227, 198.57370304, 202.66885863,
206.95343129, 211.44231618, 216.13691245, 221.02634466,
226.11581241, 231.45406809, 237.02262995, 242.83028251,
248.81360619, 255.03345526, 261.46183743, 268.20589445,
271.92855589, 275.50581231, 279.16591822, 282.90724991,
286.73714745, 290.64830841, 294.64536986, 298.72977854,
302.9070491 , 307.17661949, 311.52367919, 315.9615864 ,
320.49015071, 325.12141654, 329.86378322, 334.71438487,
339.66399361, 344.73644098, 349.93210914, 355.25219363,
360.69572122, 366.26837608, 371.96846258, 377.80329199,
383.77938626, 389.88638347, 396.14353339, 402.54835547,
409.12344679, 415.86473831, 422.7769448 , 429.86538679,
437.11955142, 444.55628708, 452.18579893, 460.02165701,
468.0649593 , 476.29650757, 484.7347087 , 493.40456576,
502.30552879, 511.44021594, 520.83114704, 530.48547572,
540.40541267, 550.59925317, 561.07367344, 571.86215993,
582.96345493, 594.39243813, 606.16232589, 618.28830266,
630.7534013 , 643.57218989, 656.77008653, 670.38608448,
684.42229927, 698.90648368, 713.75502045, 729.09525608,
744.89191175, 761.03686543, 777.65684378, 794.83179371,
812.49479186, 830.66898008, 849.46972268, 868.84766347,
888.75751215, 909.25712128, 930.36668338, 952.1656941 ,
974.65873922, 997.87604088, 1021.77859313, 1046.42121422]),
'ssim_fixssim_maxmse': array([0.81838315, 0.81837394, 0.81840433, 0.81841667, 0.8184811 ,
0.81854265, 0.81865555, 0.8185667 , 0.81860168, 0.8186602 ,
0.81849321, 0.81849264, 0.81849454, 0.81849667, 0.81850134,
0.81850698, 0.81852163, 0.8185241 , 0.81853312, 0.81854646,
0.81856169, 0.81858209, 0.81861676, 0.8186252 , 0.8184758 ,
0.81848028, 0.81848157, 0.81848229, 0.8184804 , 0.81848067,
0.81848019, 0.81847912, 0.81847739, 0.81847648, 0.81848131,
0.81848611, 0.81848742, 0.81848633, 0.81848348, 0.81848151,
0.81848324, 0.81848096, 0.81847737, 0.81847369, 0.81847083,
0.81846829, 0.81846601, 0.81846364, 0.81846065, 0.81845897,
0.81845727, 0.8184584 , 0.81845481, 0.81845096, 0.81844714,
0.81844336, 0.81844286, 0.81844313, 0.8184418 , 0.81843787,
0.81843385, 0.81843244, 0.81843006, 0.81842643, 0.81842301,
0.81842012, 0.8184164 , 0.81841261, 0.81841113, 0.81841123,
0.81841167, 0.81840845, 0.81840571, 0.81840289, 0.81840006,
0.81839648, 0.81839794, 0.81840175, 0.81840656, 0.81840632,
0.8184053 , 0.8184015 , 0.81841222, 0.8184118 , 0.81841594,
0.81844041, 0.81845594, 0.81846008, 0.81847317, 0.81848535,
0.81848382, 0.81848619, 0.81849768, 0.81851117, 0.81852665,
0.81853106, 0.81853637, 0.81854015, 0.81854653, 0.81855083]),
'minmse': 107.03801518643529,
'maxmse': 1046.4212142210524,
'noise_level': 128,
'original_image': 'samp6'}}
Run notebook online with Binder:
Reproducing Berardino et al., 2017 (Eigendistortions)
Author: Lyndon Duong, Jan 2021
In this demo, we will be reproducing eigendistortions first presented in Berardino et al 2017. We’ll be using a Front End model of the human visual system (called “On-Off” in the paper), as well as an early layer of VGG16. The Front End model is a simple convolutional neural network with a normalization nonlinearity, loosely based on biological retinal/geniculate circuitry.
This signal-flow diagram shows an input being decomposed into two channels, with each being luminance and contrast normalized, and ending with a ReLu.
What do eigendistortions tell us?
Our perception is influenced by our internal representation (neural responses) of the external world. Eigendistortions are rank-ordered directions in image space, along which a model’s responses are more sensitive. Plenoptic
’s Eigendistortion
object provides an easy way to synthesize eigendistortions for any PyTorch model.
[1]:
from plenoptic.synthesize import Eigendistortion
from plenoptic.simulate.models import OnOff
# this notebook uses torchvision, which is an optional dependency.
# if this fails, install torchvision in your plenoptic environment
# and restart the notebook kernel.
try:
from torchvision.models import vgg16
except ModuleNotFoundError:
raise ModuleNotFoundError("optional dependency torchvision not found!"
" please install it in your plenoptic environment "
"and restart the notebook kernel")
import torch
from torch import nn
import plenoptic as po
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("device: ", device)
device: cuda
[2]:
max_iter_frontend = 2000
max_iter_vgg = 5000
Input preprocessing
Let’s load the parrot image used in the paper, display it, and cast it as a float32
tensor.
[3]:
image = po.data.parrot(as_gray=True)
zoom = 1
def crop(img):
"""Returns 2D numpy as image as 4D tensor Shape((b, c, h, w))"""
img_tensor = img.clone()
return img_tensor[...,:254,:254] # crop to same size
image_tensor = crop(image).to(device)
print("Torch image shape:", image_tensor.shape)
# reduce size of image if we're on CPU, otherwise this will take too long
if device.type == 'cpu':
image_tensor = image_tensor[...,100:164,100:164]
# want to zoom so this is displayed at same size
zoom = 256 / 64
po.imshow(image_tensor, zoom=zoom);
/mnt/home/wbroderick/plenoptic/plenoptic/tools/data.py:126: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525553989/work/torch/csrc/utils/tensor_new.cpp:230.)
images = torch.tensor(images, dtype=torch.float32)
Torch image shape: torch.Size([1, 1, 254, 254])

Since the Front-end OnOff model only has two channel outputs, we can easily visualize the feature maps. We’ll apply a circular mask to this model’s inputs to avoid edge artifacts in the synthesis method.
[4]:
mdl_f = OnOff(kernel_size=(31, 31), pretrained=True, apply_mask=True)
po.tools.remove_grad(mdl_f)
mdl_f = mdl_f.to(device)
response_f = mdl_f(image_tensor)
po.imshow(response_f, title=['on channel response', 'off channel response'], zoom=zoom);
/mnt/home/wbroderick/plenoptic/plenoptic/simulate/models/frontend.py:388: UserWarning: pretrained is True but cache_filt is False. Set cache_filt to True for efficiency unless you are fine-tuning.
warn("pretrained is True but cache_filt is False. Set cache_filt to "
/mnt/home/wbroderick/miniconda3/envs/plenoptic/lib/python3.7/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525553989/work/aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]

Synthesizing eigendistortions
Front-end model: eigendistortion synthesis
Now that we have our Front End model set up, we can synthesize eigendistortions! This is done easily just by calling .synthesis()
after instantiating the Eigendistortion
object. We’ll synthesize the top and bottom k
, representing the most- and least-noticeable eigendistortions for this model.
The paper synthesizes the top and bottom k=1
eigendistortions, but we’ll set k>1
so the algorithm converges/stabilizes faster. We highly recommended running the following block on GPU, otherwise we suggest cropping the image to a smaller size.
[5]:
# synthesize the top and bottom k distortions
eigendist_f = Eigendistortion(image=image_tensor, model=mdl_f)
eigendist_f.synthesize(k=3, method='power', max_iter=max_iter_frontend)
Initializing Eigendistortion -- Input dim: 64516 | Output dim: 129032
/mnt/home/wbroderick/plenoptic/plenoptic/tools/validate.py:179: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
"model is in training mode, you probably want to call eval()"
Top k=3 eigendists computed | Tolerance 1.00E-07 reached.
Bottom k=3 eigendists computed | Tolerance 1.00E-07 reached.
Front-end model: eigendistortion display
Once synthesized, we can plot the distortion on the image using Eigendistortion
’s built-in display method. Feel free to adjust the constants alpha_max
and alpha_min
that scale the amount of each distortion on the image.
[6]:
po.imshow(eigendist_f.eigendistortions[[0,-1]].mean(1, keepdim=True), vrange='auto1',
title=["most-noticeable distortion", "least-noticeable"], zoom=zoom)
alpha_max, alpha_min = 3., 4.
f_max = po.synth.eigendistortion.display_eigendistortion(eigendist_f, eigenindex=0, alpha=alpha_max,
title=f'img + {alpha_max} * max_dist', zoom=zoom)
f_min = po.synth.eigendistortion.display_eigendistortion(eigendist_f, eigenindex=-1, alpha=alpha_min,
title=f'img + {alpha_min} * min_dist', zoom=zoom)



VGG16: eigendistortion synthesis
Following the lead of Berardino et al. (2017), let’s compare the Front End model’s eigendistortion to those of an early layer of VGG16! VGG16 takes as input color images, so we’ll need to repeat the grayscale parrot along the RGB color dimension.
[7]:
# Create a class that takes the nth layer output of a given model
class NthLayerVGG16(nn.Module):
"""Wrapper to get the response of an intermediate layer of VGG16"""
def __init__(self, layer: int = None, device=torch.device('cpu')):
"""
Parameters
----------
layer: int
Which model response layer to output
"""
super().__init__()
model = vgg16(pretrained=True, progress=True).to(device)
features = list(model.features)
self.features = nn.ModuleList(features).eval()
if layer is None:
layer = len(self.features)
self.layer = layer
def forward(self, x):
for ii, mdl in enumerate(self.features):
x = mdl(x)
if ii == self.layer:
return x
VGG16 was trained on pre-processed ImageNet images with approximately zero mean and unit stdev, so we can preprocess our Parrot image the same way.
[8]:
# VGG16
def normalize(img_tensor):
"""standardize the image for vgg16"""
return (img_tensor-img_tensor.mean())/ img_tensor.std()
image_tensor = normalize(crop(image)).to(device)
# reduce size of image if we're on CPU, otherwise this will take too long
if device.type == 'cpu':
image_tensor = image_tensor[...,100:164,100:164]
# want to zoom so this is displayed at same size
zoom = 256 / 64
image_tensor3 = torch.cat([image_tensor]*3, dim=1).to(device)
# "layer 3" according to Berardino et al (2017)
mdl_v = NthLayerVGG16(layer=11, device=device)
po.tools.remove_grad(mdl_v)
eigendist_v = Eigendistortion(image=image_tensor3, model=mdl_v)
eigendist_v.synthesize(k=2, method='power', max_iter=max_iter_vgg)
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /mnt/home/wbroderick/.cache/torch/hub/checkpoints/vgg16-397923af.pth
Initializing Eigendistortion -- Input dim: 193548 | Output dim: 1016064
VGG16: eigendistortion display
We can now display the most- and least-noticeable eigendistortions as before, then compare their quality to those of the Front-end model.
Since the distortions here were synthesized using a pre-processed (normalized) imagea, we can easily pass a function to unprocess the image. Since the previous eigendistortions were grayscale, we’ll just take the mean across RGB channels for VGG16-synthesized eigendistortions and display them as grayscale too.
[9]:
po.imshow(eigendist_v.eigendistortions[[0,-1]].mean(1, keepdim=True), vrange='auto1',
title=["most-noticeable distortion", "least-noticeable"], zoom=zoom)
# create an image processing function to unnormalize the image and avg the channels to grayscale
unnormalize = lambda x: (x*image.std() + image.mean()).mean(1, keepdims=True)
alpha_max, alpha_min = 15., 100.
v_max = po.synth.eigendistortion.display_eigendistortion(eigendist_v, eigenindex=0, alpha=alpha_max,
process_image=unnormalize,
title=f'img + {alpha_max} * most_noticeable_dist',
zoom=zoom)
v_min = po.synth.eigendistortion.display_eigendistortion(eigendist_v, eigenindex=-1, alpha=alpha_min,
process_image=unnormalize,
title=f'img + {alpha_min} * least_noticeable_dist',
zoom=zoom)



Final thoughts
To rigorously test which of these model’s representations are more human-like, we’ll have to conduct a perceptual experiment. For now, we’ll just leave it to you to eyeball and decide which distortions are more or less noticeable!
Synthesis object design
The following describes how synthesis objects are structured. This is probably
most useful if you are creating a new synthesis method that you would like to
include in or be compliant with plenoptic
, rather than using existing ones.
The synthesis methods included in plenoptic
generate one or more novel
images based on the output of a model. These images can be used to better
understand the model or as stimuli for an experiment comparing the model against
another system. Beyond this rather vague description, however, there is a good
deal of variability. We use inheritance in order to try and keep synthesis
methods as similar as possible, to facilitate user interaction with them (and
testing), but we want to avoid forcing too much similarity.
In the following description:
- ** must connotes a requirement; any synthesis object not meeting this property
will not be merged and is not considered “plenoptic-compliant”.
should connotes a suggestion; a compelling reason is required if the property is not met.
may connotes an option; these properties may make things easier (for developers or users), but are completely optional.
All synthesis methods
To that end, all synthesis methods must inherit the
plenoptic.synthesize.synthesis.Synthesis
class. This requires the synthesis method
to have a synthesize()
method, and provides helper functions for save()
,
load()
, and to()
, which must be used when implementing them.
Furthermore:
the initialization method (
__init__()
) must accept any images as its first input(s). If only a single image is accepted, it must be namedimage
. If more than one, their names must be of the formimage_X
, replacingX
with a more descriptive string. These must all have typetorch.Tensor
and they must be validated withplenoptic.tools.validate.validate_input()
. This should be stored in an attribute with the same name as the argument.the initialization method’s next argument(s) must be any models or metrics that the synthesis will be based on. Similarly, if a single model / metric is accepted, they must be named
model
/metric
. If more than one, their names should be of the formX_model
/X_metric
, replacingX
with a more descriptive string. These must be validated withplenoptic.tools.validate.validate_model()
/plenoptic.tools.validate.validate_metric()
. This should be stored in an attribute with the same name as the argument.any other arguments to the initialization method may follow.
the object must be able to work on GPU and CPU. Users must be able to use the GPU either by initializing the synthesis object with tensors or models already on the GPU or by calling
.to()
. The easiest way to do this is to usetorch.rand_like()
and analogous methods, and explicitly calling.to()
on any other newly-created tensors.ideally, the same for different float and complex data types (e.g., support both
torch.float32
andtorch.float64
), though this is not a strict requirement if there’s a good reason.if
synthesize()
operates in an iterative fashion, it must accept amax_iter: int
argument to specify how long to run synthesis for and astop_criterion: float
argument to allow for early termination if some convergence is reached. What exactly is being checked for convergence (e.g., change in loss, change in pixel values) may vary, but it must be clarified in the docstring. Astop_iters_to_check: int
argument may also be included, which specifies how many iterations ago to check. If it is not included, the number of iterations must be clarified in docstring.additionally, if synthesis is iterative, tqdm.auto.tqdm must be used as a progress bar, initialized with
pbar = tqdm(range(max_iter))
, which should present information usingpbar.set_postfix()
(such the loss or whatever else is checked for convergence, as discussed above).synthesize()
must not return anything. The outputs of synthesis must be stored as attributes of the object. The number of large attributes should be minimized in order to reduce overall size in memory.the synthesis output must be stored as an attribute with the same name as the class (e.g.,
Metamer.metamer
).any attribute or method that the user does not need should be hidden (i.e., start with
_
).consider using the
@property
decorator to make important attributes write-only or differentiate between the public and private views. For example, the optimized attribute of theplenoptic.synthesize.geodesic.Geodesic
class is named_geodesic
, but thegeodesic
attribute returns this tensor concatenated with two (unchanging) endpoints, as this is what the user will most often want to interact with.
The above are the only requirements that all synthesis methods must meet.
Helper / display functions
It may also be useful to include some functions for investigating the status or
output(s) of synthesis. As a general rule, if a function will be called during
synthesis (e.g., to compute a loss value), it should be a method of the object.
If it is only called afterwards (e.g., to display the synthesis outputs in a
useful way), it should be included as a function in the same file (see
plenoptic.synthesize.metamer.display_metamer()
for an example).
Functions that show images or videos should be called display_X
, whereas
those that show numbers as a scatter plot, line plot, etc. should be called
plot_X
. These must be axes-level matplotlib functions: they must accept
an axis as an optional argument named ax
, which will contain the plot. If no
ax
is supplied, matplotlib.pyplot.gca()
must be used to create / grab
the axis. If a multi-axis figure is called for (e.g., to display the synthesis
output and plot the loss), a function named plot_synthesis_status()
should
be created. This must have an optional fig
argument, creating a figure if
none is supplied. See plenoptic.synthesize.metamer.plot_synthesis_status()
for an example. If possible, this plot should be able to be animated to show
progress over time. See
plenoptic.synthesize.metamer.plot_synthesis_status()
for an example.
See our Display and animate functions notebook for description and examples of the included plotting and display code.
Optimized synthesis
Many synthesis methods will use an optimizer to generate their outputs. If the
method makes use of a torch.optim.Optimizer
object, it must inherit
plenoptic.synthesize.synthesis.OptimizedSynthesis
class (this is a
subclass of:class:plenoptic.synthesize.synthesis.Synthesis, so the above all
still applies).
Currently, the following are required (if not all of these are applicable to new
methods, we may modify OptimizedSynthesis
):
the points about iterative synthesis described above all hold:
synthesize()
must acceptmax_iter
,stop_criterion
, may acceptstop_iters_to_check
, and must use tqdm.auto.tqdm.the object must have an
objective_function()
method, which returns a measure of “how bad” the current synthesis output is. Optimization is minimizing this value.the object must have a
_check_convergence()
method, which is used (along withstop_criterion
and, optionally,stop_iters_to_check
) to determine if synthesis has converged.the object must have an
_initialize()
method, which initializes the synthesis output (e.g., with an appropriately-shaped sample of noise) and is called during the object’s initilization.the initialization method may accept some argument to affect this initialization, which should be named
initial_X
(replacingX
as appropriate). For example, this could be another image to use for initialization (initial_image
) or some property of noise used to generate an initial image (initial_noise
).the initialization method must accept
range_penalty_lambda: float
andallowed_range: Tuple[float, float]
arguments, which should be used withplenoptic.tools.optim.penalize_range()
to constrain the range of synthesis output.the
synthesize()
method must accept an optionaloptimizer: torch.optim.Optimizer
argument, which defaults toNone
.OptimizedSynthesis._initialize_optimizer()
is a helper function that should be called to set this up: it creates a default optimizer if the user does not specify one and double-checks that the optimizer parameter is the correct object if the user did.during synthesis, the object should update the
_losses
,_gradient_norm
, and_pixel_change_norm
attributes on each iteration.the object may have a
_closure()
method, which performs the gradient calculation. This (when passed tooptimizer.step()
during the synthesis loop insynthesize()
) enables optimization algorithms that perform several evaluations of the gradient before taking a step (e.g., second-order methods). SeeOptimizedSynthesis._closure()
for the simplest version of this.the
synthesize()
method should accept astore_progress
argument, which optionally stores additional information over iteration, such as the synthesis output-in-progress.OptimizedSynthesis
has a setter method for this attribute, which will enable things are correct. This argument can be an integer (in which case, the attributes are updated everystore_progress
iterations),True
(same behavior as1
), orFalse
(no updating of attributes). This should probably be done in a method named_store()
.the
synthesize()
method should be callable multiple times with the same object, in which case progress is resumed. On all subsequent calls,optimizer
must beNone
(this is checked byOptimizedSynthesis._initialize_optimizer()
) andstore_progress
,stop_criterion
, andstop_iters_to_check
must have the same values.
How to order methods
Python doesn’t care how you order any of the methods or properties of a class, but doing so in a consistent manner will make reading the code easier, so try to follow these guidelines:
The caller should (almost always) be above the callee and related concepts should be close together.
__init__()
should be first, followed by any methods called within it. This will probably include_initialize()
, for those classes that have it.After all those initialization-related methods,
synthesize()
should come next. Again, this should be followed by most of the the methods called within it, ordered roughly by importance. Thus, the first methods should probably beobjective_function()
and_optimizer_step()
, followed by_check_convergence()
. What shouldn’t be included in this section are helper methods that aren’t scientifically interesting (e.g.,_initialize_optimizer()
,_store()
).Next, any other content-related methods, such as helper methods that perform useful computations that are not called by
__init__()
orsynthesize()
(e.g.,plenoptic.synthesize.geodesic.Geodesic.calculate_jerkiness
).Next, the helper functions we ignored from earlier, such as
_initialize_optimizer()
and_store()
.Next,
save()
,load()
,to()
.Finally, all the properties.
Tips and Tricks
Why does synthesis take so long?
Synthesis can take a while to run, especially if you are trying to synthesize a large image or using a complicated model. The following might help:
Reducing the amount of time your model’s forward pass takes is the easiest way to reduce the overall duration of synthesis, as the forward pass is called many, many times over synthesis. Try using python’s built-in profiling tools to check which part of your model’s forward pass is taking the longest, and try to make those parts more efficient. Jupyter also has nice profiling tools. For example, if you have for loops in your code, try and replace them with matrix operations and einsum.
If you have access to a GPU, use it! If your inputs are on the GPU before initializing the synthesis methods, the synthesis methods will also make use of the GPU. You can also move the
plenoptic
’s synthesis methods and models over to the GPU after initialization using the.to()
method.
Optimization is hard
You should double-check whether synthesis has successfully completed before interpreting the outputs or using them in any experiments. This is not necessary for eigendistortions (see its notebook for more details on why), but is necessary for all the iterative optimization methods.
For metamers, this means double-checking that the difference between the model representation of the metamer and the target image is small enough. If your model’s representation is multi-scale, trying coarse-to-fine optimization may help (see notebook for details).
For MAD competition, this means double-checking that the reference metric is constant and that the optimized metric has converged at a lower or higher value (depending on the value of
synthesis_target
); useplenoptic.synthesize.mad_competition.plot_synthesis_status()
to visualize these values. You will likely need to spend time trying out different values for themetric_tradeoff_lambda
argument set during initialization to achieve this.For geodesics, check that your geodesic’s path energy is small enough and that the deviation from a straight line in representational space is minimal (use
plenoptic.synthesize.geodesic.plot_deviation_from_line()
)
For all of the above, if synthesis has not found a good solution, you may need
to run synthesis longer, use a learning-rate scheduler, change the learning
rate, or try different optimizers. Each method’s objective_function
method
captures the value that we are trying to minimize, but may contain other values
(such as the penalty on allowed range values).
Additionally, it may be helpful to visualize the progression of synthesis, using
each synthesis method’s animate
or plot_synthesis_status
helper
functions (e.g., plenoptic.synthesize.metamer.plot_synthesis_status()
).
Tweaking the model
You can also improve your changes of finding a good synthesis by tweaking the model. For example, the loss function used for metamer synthesis by default is mean-squared error. This implicitly weights all aspects of the model’s representation equally. Thus, if there are portions of the representation whose magnitudes are significantly smaller than the others, they might not be matched at the same rate as the others. You can address this using coarse-to-fine synthesis or picking a more suitable loss function, but it’s generally a good idea for all of a model’s representation to have roughly the same magnitude. You can do this in a principled or empirical manner:
Principled: compose your representation of statistics that you know lie within the same range. For example, use correlations instead of covariances (see the Portilla-Simoncelli model, and in particular how plenoptic’s implementation differs from matlab for an example of this).
Empirical: measure your model’s representation on a dataset of relevant natural images and then use this output to z-score your model’s representation on each pass (see [Ziemba2021] for an example; this is what the Van Hateren database is used for).
In the middle: normalize statistics based on their value in the original image (note: not the image the model is taking as input! this will likely make optimization very difficult).
If you are computing a multi-channel representation, you may have a similar problem where one channel is larger or smaller than the others. Here, tweaking the loss function might be more useful. Using something like logsumexp (the log of the sum of exponentials, a smooth approximation of the maximum function) to combine across channels after using something like L2-norm to compute the loss within each channel might help.
None of the existing synthesis methods meet my needs
plenoptic
provides four synthesis methods, but you may find you wish to do
something slightly outside the capabilities of the existing methods. There are
generally two ways to do this: by tweaking your model or by extending one of the
methods.
See the Portilla-Simoncelli texture model notebook for examples on how to get different metamer results by tweaking your model or extending the
Metamer
class.The coarse-to-fine optimization, discussed in the metamer notebook, is an example of changing optimization by extending the
Metamer
class.The Synthesis extensions notebook contains a discussion focused on this as well.
If you extend a method successfully or would like help making it work, please let us know by posting a discussion!
Reproducibility
plenoptic
includes several results reproduced from the literature and aims to
facilitate reproducible research. However, we are limited by our dependencies
and PyTorch, in particular, comes with the caveat that “Completely
reproducible results are not guaranteed across PyTorch releases, individual
commits, or different platforms. Furthermore, results may not be reproducible
between CPU and GPU executions, even when using identical seeds” (quote from the
v1.12 documentation).
This means that you should note the plenoptic
version and the pytorch
version your synthesis used in order to guarantee reproduciblity (some versions
of pytorch
will give consistent results with each other, but it’s not
guaranteed and hard to predict). We do not believe reproducibility depends on
the python version or any other packages. In general, the CPU and GPU will
always give different results.
We reproduce several results from the literature and validate these as part of our tests. We are therefore aware of the following changes that broke reproducibility:
PyTorch 1.8 and 1.9 give the same results, but 1.10 changes results in changes, probably due to the difference in how the sub-gradient for
torch.min
andtorch.max
are computed (see this PR).PyTorch 1.12 breaks reproducibility with 1.10 and 1.11, unclear why (see this issue).
plenoptic
plenoptic package
Subpackages
plenoptic.data package
Submodules
plenoptic.data.data_utils module
- plenoptic.data.data_utils.get(*item_names, as_gray=None)[source]
Load an image based on the item name from the package’s data resources.
- Parameters:
item_names (
str
) – The names of the items to load, without specifying the file extension.as_gray (
Optional
[bool
]) – Whether to load in the image(s) as grayscale or not. If None, will make best guess based on file extension.
- Return type:
The loaded image object. The exact return type depends on the load_images function implementation.
Notes
This function first retrieves the full filename using get_filename and then loads the image using load_images from the tools.data module. It supports loading images as grayscale if they have a .pgm extension.
- plenoptic.data.data_utils.get_path(item_name)[source]
Retrieve the filename that matches the given item name with any extension.
- Parameters:
item_name (
str
) – The name of the item to find the file for, without specifying the file extension.- Return type:
Traversable
- Returns:
The filename matching the item_name with its extension.
- Raises:
AssertionError – If no files or more than one file match the item_name.
Notes
This function uses glob to search for files in the current directory matching the item_name. It is assumed that there is only one file matching the name regardless of its extension.
plenoptic.data.fetch module
Fetch data using pooch.
This is inspired by scipy’s datasets module.
- plenoptic.data.fetch.fetch_data(dataset_name)[source]
Download data, using pooch. These are largely used for testing.
To view list of downloadable files, look at DOWNLOADABLE_FILES.
This checks whether the data already exists and is unchanged and downloads again, if necessary. If dataset_name ends in .tar.gz, this also decompresses and extracts the archive, returning the Path to the resulting directory. Else, it just returns the Path to the downloaded file.
- Return type:
Path
Find directory shared by all paths.
- Return type:
Path
Module contents
- plenoptic.data.fetch_data(dataset_name)[source]
Download data, using pooch. These are largely used for testing.
To view list of downloadable files, look at DOWNLOADABLE_FILES.
This checks whether the data already exists and is unchanged and downloads again, if necessary. If dataset_name ends in .tar.gz, this also decompresses and extracts the archive, returning the Path to the resulting directory. Else, it just returns the Path to the downloaded file.
- Return type:
Path
plenoptic.metric package
Submodules
plenoptic.metric.classes module
- class plenoptic.metric.classes.NLP[source]
Bases:
Module
simple class for implementing normalized laplacian pyramid
This class just calls
plenoptic.metric.normalized_laplacian_pyramid
on the image and returns a 3d tensor with the flattened activations.NOTE: synthesis using this class will not be the exact same as synthesis using the
plenoptic.metric.nlpd
function (by default), because the synthesis methods usetorch.norm(x - y, p=2)
as the distance metric between representations, whereasnlpd
uses the root-mean square of the distance (i.e.,torch.sqrt(torch.mean(x-y)**2))
Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(image)returns flattened NLP activations
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post hook to be run after module's
load_state_dict
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_pre_hook
(hook)Register a pre-hook for the
load_state_dict()
method.requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- forward(image)[source]
returns flattened NLP activations
WARNING: For now this only supports images with batch and channel size 1
- Parameters:
image (torch.Tensor) – image to pass to normalized_laplacian_pyramid
- Returns:
representation – 3d tensor with flattened NLP activations
- Return type:
torch.Tensor
plenoptic.metric.model_metric module
plenoptic.metric.naive module
- plenoptic.metric.naive.mse(img1, img2)[source]
return the MSE between img1 and img2
Our baseline metric to compare two images is often mean-squared error, MSE. This is not a good approximation of the human visual system, but is handy to compare against.
For two images, \(x\) and \(y\), with \(n\) pixels each:
\[MSE &= \frac{1}{n}\sum_i=1^n (x_i - y_i)^2\]The two images must have a float dtype
- Parameters:
img1 (torch.Tensor) – The first image to compare
img2 (torch.Tensor) – The second image to compare, must be same size as
img1
- Returns:
mse – the mean-squared error between
img1
andimg2
- Return type:
torch.float
plenoptic.metric.perceptual_distance module
- plenoptic.metric.perceptual_distance.ms_ssim(img1, img2, power_factors=None)[source]
Multiscale structural similarity index (MS-SSIM)
As described in [1], multiscale structural similarity index (MS-SSIM) is an improvement upon structural similarity index (SSIM) that takes into account the perceptual distance between two images on different scales.
SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images, producing three maps instead of scalars. The SSIM map is the elementwise product of these three maps. See metric.ssim and metric.ssim_map for a full description of SSIM.
To get images of different scales, average pooling operations with kernel size 2 are performed recursively on the input images. The product of contrast map and structure map (the “contrast-structure map”) is computed for all but the coarsest scales, and the overall SSIM map is only computed for the coarsest scale. Their mean values are raised to exponents and multiplied to produce MS-SSIM:
\[MSSSIM = {SSIM}_M^{a_M} \prod_{i=1}^{M-1} ({CS}_i)^{a_i}\]Here :math: M is the number of scales, :math: {CS}_i is the mean value of the contrast-structure map for the i’th finest scale, and :math: {SSIM}_M is the mean value of the SSIM map for the coarsest scale. If at least one of these terms are negative, the value of MS-SSIM is zero. The values of :math: a_i, i=1,…,M are taken from the argument power_factors.
- Parameters:
img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.
img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).
power_factors (1D array, optional.) – power exponents for the mean values of maps, for different scales (from fine to coarse). The length of this array determines the number of scales. By default, this is set to [0.0448, 0.2856, 0.3001, 0.2363, 0.1333], which is what psychophysical experiments in [1] found.
- Returns:
msssim – 2d tensor of shape (batch, channel) containing the MS-SSIM for each image
- Return type:
torch.Tensor
References
- plenoptic.metric.perceptual_distance.nlpd(img1, img2)[source]
Normalized Laplacian Pyramid Distance
As described in [1], this is an image quality metric based on the transformations associated with the early visual system: local luminance subtraction and local contrast gain control
A laplacian pyramid subtracts a local estimate of the mean luminance at six scales. Then a local gain control divides these centered coefficients by a weighted sum of absolute values in spatial neighborhood.
These weights parameters were optimized for redundancy reduction over an training database of (undistorted) natural images.
Note that we compute root mean squared error for each scale, and then average over these, effectively giving larger weight to the lower frequency coefficients (which are fewer in number, due to subsampling).
- Parameters:
img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.
img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).
- Returns:
distance – The normalized Laplacian Pyramid distance.
- Return type:
torch.Tensor of shape (batch, channel)
References
[1]Laparra, V., Ballé, J., Berardino, A. and Simoncelli, E.P., 2016. Perceptual image quality assessment using a normalized Laplacian pyramid. Electronic Imaging, 2016(16), pp.1-6.
- plenoptic.metric.perceptual_distance.normalized_laplacian_pyramid(img)[source]
Compute the normalized Laplacian Pyramid using pre-optimized parameters
- Parameters:
img (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. This representation is designed for grayscale images and will be computed separately for each channel (so channels are treated in the same way as batches).
- Returns:
normalized_laplacian_activations – The normalized Laplacian Pyramid with six scales
- Return type:
list of torch.Tensor
- plenoptic.metric.perceptual_distance.ssim(img1, img2, weighted=False, pad=False)[source]
Structural similarity index
As described in [1], the structural similarity index (SSIM) is a perceptual distance metric, giving the distance between two images. SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images. See the references for more information.
This implementation follows the original implementation, as found at [2], as well as providing the option to use the weighted version used in [4] (which was shown to consistently improve the image quality prediction on the LIVE database).
Note that this is a similarity metric (not a distance), and so 1 means the two images are identical and 0 means they’re very different. When the two images are negatively correlated, SSIM can be negative. SSIM is bounded between -1 and 1.
This function returns the mean SSIM, a scalar-valued metric giving the average over the whole image. For the SSIM map (showing the computed value across the image), call ssim_map.
- Parameters:
img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.
img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).
weighted (bool, optional) – whether to use the original, unweighted SSIM version (False) as used in [1] or the weighted version (True) as used in [4]. See Notes section for the weight
pad ({False, 'constant', 'reflect', 'replicate', 'circular'}, optional) – If not False, how to pad the image for the convolutions computing the local average of each image. See torch.nn.functional.pad for how these work.
- Returns:
mssim – 2d tensor of shape (batch, channel) containing the mean SSIM for each image, averaged over the whole image
- Return type:
torch.Tensor
Notes
The weight used when weighted=True is:
\[\log((1+\frac{\sigma_1^2}{C_2})(1+\frac{\sigma_2^2}{C_2}))\]where \(sigma_1^2\) and \(sigma_2^2\) are the variances of img1 and img2, respectively, and \(C_2\) is a constant. See [4] for more details.
References
[1] (1,2)Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error measurement to structural similarity” IEEE Transactions on Image Processing, vol. 13, no. 1, Jan. 2004.
[2][matlab code](https://www.cns.nyu.edu/~lcv/ssim/ssim_index.m)
[3][project page](https://www.cns.nyu.edu/~lcv/ssim/)
[4] (1,2,3)Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual discriminability. Journal of Vision, 8(12), 1–13. http://dx.doi.org/10.1167/8.12.8
- plenoptic.metric.perceptual_distance.ssim_map(img1, img2)[source]
Structural similarity index map
As described in [1], the structural similarity index (SSIM) is a perceptual distance metric, giving the distance between two images. SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images. See the references for more information.
This implementation follows the original implementation, as found at [2], as well as providing the option to use the weighted version used in [4] (which was shown to consistently improve the image quality prediction on the LIVE database).
Note that this is a similarity metric (not a distance), and so 1 means the two images are identical and 0 means they’re very different. When the two images are negatively correlated, SSIM can be negative. SSIM is bounded between -1 and 1.
This function returns the SSIM map, showing the SSIM values across the image. For the mean SSIM (a single value metric), call ssim.
- Parameters:
img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.
img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).
weighted (bool, optional) – whether to use the original, unweighted SSIM version (False) as used in [1] or the weighted version (True) as used in [4]. See Notes section for the weight
- Returns:
ssim_map – 4d tensor containing the map of SSIM values.
- Return type:
torch.Tensor
References
[1] (1,2)Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error measurement to structural similarity” IEEE Transactions on Image Processing, vol. 13, no. 1, Jan. 2004.
[2][matlab code](https://www.cns.nyu.edu/~lcv/ssim/ssim_index.m)
[3][project page](https://www.cns.nyu.edu/~lcv/ssim/)
[4] (1,2)Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual discriminability. Journal of Vision, 8(12), 1–13. http://dx.doi.org/10.1167/8.12.8
Module contents
plenoptic.simulate package
Subpackages
plenoptic.simulate.canonical_computations package
Submodules
plenoptic.simulate.canonical_computations.filters module
- plenoptic.simulate.canonical_computations.filters.circular_gaussian2d(kernel_size, std, out_channels=1)[source]
Creates normalized, centered circular 2D gaussian tensor with which to convolve.
- Parameters:
kernel_size (
Union
[int
,Tuple
[int
,int
]]) – Filter kernel size. Recommended to be odd so that kernel is properly centered.std (
Union
[float
,Tensor
]) – Standard deviation of 2D circular Gaussian.out_channels (
int
) – Number of channels with same kernel repeated along channel dim.
- Returns:
Circular gaussian kernel, normalized by total pixel-sum (_not_ by 2pi*std). filt has Size([out_channels=n_channels, in_channels=1, height, width]).
- Return type:
filt
- plenoptic.simulate.canonical_computations.filters.gaussian1d(kernel_size=11, std=1.5)[source]
Normalized 1D Gaussian.
1d Gaussian of size kernel_size, centered half-way, with variable std deviation, and sum of 1.
With default values, this is the 1d Gaussian used to generate the windows for SSIM
- Parameters:
kernel_size (
int
) – Size of Gaussian. Recommended to be odd so that kernel is properly centered.std (
Union
[float
,Tensor
]) – Standard deviation of Gaussian.
- Returns:
1d Gaussian with Size([kernel_size]).
- Return type:
filt
plenoptic.simulate.canonical_computations.laplacian_pyramid module
- class plenoptic.simulate.canonical_computations.laplacian_pyramid.LaplacianPyramid(n_scales=5, scale_filter=False)[source]
Bases:
Module
Laplacian Pyramid in Torch.
The Laplacian pyramid [1] is a multiscale image representation. It decomposes the image by computing the local mean using Gaussian blurring filters and substracting it from the image and repeating this operation on the local mean itself after downsampling. This representation is overcomplete and invertible.
- Parameters:
n_scales (int) – number of scales to compute
scale_filter (bool, optional) – If true, the norm of the downsampling/upsampling filter is 1. If false (default), it is 2. If the norm is 1, the image is multiplied by 4 during the upsampling operation; the net effect is that the n`th scale of the pyramid is divided by `2^n.
References
[1]Burt, P. and Adelson, E., 1983. The Laplacian pyramid as a compact image code. IEEE Transactions on communications, 31(4), pp.532-540.
Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(x)Build the Laplacian pyramid of an image.
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
recon_pyr
(y)Reconstruct the image from its Laplacian pyramid.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post hook to be run after module's
load_state_dict
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_pre_hook
(hook)Register a pre-hook for the
load_state_dict()
method.requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- forward(x)[source]
Build the Laplacian pyramid of an image.
- Parameters:
x (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. If there are multiple channels, the Laplacian is computed separately for each of them
- Returns:
y – Laplacian pyramid representation, each element of the list corresponds to a scale, from fine to coarse
- Return type:
list of torch.Tensor
- recon_pyr(y)[source]
Reconstruct the image from its Laplacian pyramid.
- Parameters:
y (list of torch.Tensor) – Laplacian pyramid representation, each element of the list corresponds to a scale, from fine to coarse
- Returns:
x – Image, or batch of images
- Return type:
torch.Tensor of shape (batch, channel, height, width)
plenoptic.simulate.canonical_computations.non_linearities module
- plenoptic.simulate.canonical_computations.non_linearities.local_gain_control(x, epsilon=1e-08)[source]
Spatially local gain control.
- Parameters:
x (torch.Tensor) – Tensor of shape (batch, channel, height, width)
epsilon (float, optional) – Small constant to avoid division by zero.
- Returns:
norm (torch.Tensor) – The local energy of
x
. Note that it is down sampled by a factor 2 in (unlike rect2pol).direction (torch.Tensor) – The local phase of
x
(aka. local unit vector, or local state)
Notes
This function is an analogue to rectangular_to_polar for real valued signals.
Norm and direction (analogous to complex modulus and phase) are defined using blurring operator and division. Indeed blurring the responses removes high frequencies introduced by the squaring operation. In the complex case adding the quadrature pair response has the same effect (note that this is most clearly seen in the frequency domain). Here computing the direction (phase) reduces to dividing out the norm (modulus), indeed the signal only has one real component. This is a normalization operation (local unit vector), hence the connection to local gain control.
- plenoptic.simulate.canonical_computations.non_linearities.local_gain_control_dict(coeff_dict, residuals=True)[source]
Spatially local gain control, for each element in a dictionary.
- Parameters:
coeff_dict (dict) – A dictionary containing tensors of shape (batch, channel, height, width)
residuals (bool, optional) – An option to carry around residuals in the energy dict. Note that the transformation is not applied to the residuals, that is dictionary elements with a key starting in “residual”.
- Returns:
energy (dict) – The dictionary of torch.Tensors containing the local energy of
x
.state (dict) – The dictionary of torch.Tensors containing the local phase of
x
.
Notes
Note that energy and state is not computed on the residuals.
The inverse operation is achieved by local_gain_release_dict. This function is an analogue to rectangular_to_polar_dict for real valued signals. For more details, see
local_gain_control()
- plenoptic.simulate.canonical_computations.non_linearities.local_gain_release(norm, direction, epsilon=1e-08)[source]
Spatially local gain release.
- Parameters:
norm (torch.Tensor) – The local energy of
x
. Note that it is down sampled by a factor 2 in (unlike rect2pol).direction (torch.Tensor) – The local phase of
x
(aka. local unit vector, or local state)epsilon (float, optional) – Small constant to avoid division by zero.
- Returns:
x – Tensor of shape (batch, channel, height, width)
- Return type:
torch.Tensor
Notes
This function is an analogue to polar_to_rectangular for real valued signals.
Norm and direction (analogous to complex modulus and phase) are defined using blurring operator and division. Indeed blurring the responses removes high frequencies introduced by the squaring operation. In the complex case adding the quadrature pair response has the same effect (note that this is most clearly seen in the frequency domain). Here computing the direction (phase) reduces to dividing out the norm (modulus), indeed the signal only has one real component. This is a normalization operation (local unit vector), hence the connection to local gain control.
- plenoptic.simulate.canonical_computations.non_linearities.local_gain_release_dict(energy, state, residuals=True)[source]
Spatially local gain release, for each element in a dictionary.
- Parameters:
energy (dict) – The dictionary of torch.Tensors containing the local energy of
x
.state (dict) – The dictionary of torch.Tensors containing the local phase of
x
.residuals (bool, optional) – An option to carry around residuals in the energy dict. Note that the transformation is not applied to the residuals, that is dictionary elements with a key starting in “residual”.
- Returns:
coeff_dict – A dictionary containing tensors of shape (batch, channel, height, width)
- Return type:
dict
Notes
The inverse operation to local_gain_control_dict. This function is an analogue to polar_to_rectangular_dict for real valued signals. For more details, see
local_gain_release()
- plenoptic.simulate.canonical_computations.non_linearities.polar_to_rectangular_dict(energy, state, residuals=True)[source]
Return the real and imaginary parts of tensor in a dictionary.
- Parameters:
energy (dict) – The dictionary of torch.Tensors containing the local complex modulus.
state (dict) – The dictionary of torch.Tensors containing the local phase.
dim (int, optional) – The dimension that contains the real and imaginary components.
residuals (bool, optional) – An option to carry around residuals in the energy branch.
- Returns:
coeff_dict – A dictionary containing complex tensors of coefficients.
- Return type:
dict
- plenoptic.simulate.canonical_computations.non_linearities.rectangular_to_polar_dict(coeff_dict, residuals=False)[source]
Return the complex modulus and the phase of each complex tensor in a dictionary.
- Parameters:
coeff_dict (dict) – A dictionary containing complex tensors.
dim (int, optional) – The dimension that contains the real and imaginary components.
residuals (bool, optional) – An option to carry around residuals in the energy branch.
- Returns:
energy (dict) – The dictionary of torch.Tensors containing the local complex modulus of
coeff_dict
.state (dict) – The dictionary of torch.Tensors containing the local phase of
coeff_dict
.
plenoptic.simulate.canonical_computations.steerable_pyramid_freq module
Steerable frequency pyramid
Construct a steerable pyramid on matrix two dimensional signals, in the Fourier domain.
- class plenoptic.simulate.canonical_computations.steerable_pyramid_freq.SteerablePyramidFreq(image_shape, height='auto', order=3, twidth=1, is_complex=False, downsample=True, tight_frame=False)[source]
Bases:
Module
Steerable frequency pyramid in Torch
Construct a steerable pyramid on matrix two dimensional signals, in the Fourier domain. Boundary-handling is circular. Reconstruction is exact (within floating point errors). However, if the image has an odd-shape, the reconstruction will not be exact due to boundary-handling issues that have not been resolved.
The squared radial functions tile the Fourier plane with a raised-cosine falloff. Angular functions are cos(theta-k*pi/order+1)^(order).
Notes
Transform described in [1], filter kernel design described in [2]. For further information see the project webpage_
- Parameters:
image_shape (list or tuple) – shape of input image
height (‘auto’ or int) – The height of the pyramid. If ‘auto’, will automatically determine based on the size of image.
order (int.) – The Gaussian derivative order used for the steerable filters, in [1, 15]. Note that to achieve steerability the minimum number of orientation is order + 1, and is used here. To get more orientations at the same order, use the method steer_coeffs
twidth (int) – The width of the transition region of the radial lowpass function, in octaves
is_complex (bool) – Whether the pyramid coefficients should be complex or not. If True, the real and imaginary parts correspond to a pair of even and odd symmetric filters. If False, the coefficients only include the real part / even
downsample (bool) – Whether to downsample each scale in the pyramid or keep the output pyramid coefficients in fixed bands of size imshapeximshape. When downsample is False, the forward method returns a tensor.
tight_frame (bool default: False) – Whether the pyramid obeys the generalized parseval theorem or not (i.e. is a tight frame). If True, the energy of the pyr_coeffs = energy of the image. If not this is not true. In order to match the matlabPyrTools or pyrtools pyramids, this must be set to False
- image_shape
shape of input image
- Type:
list or tuple
- pyr_size
Dictionary containing the sizes of the pyramid coefficients. Keys are (level, band) tuples and values are tuples.
- Type:
dict
- fft_norm
The way the ffts are normalized, see pytorch documentation for more details.
- Type:
str
- is_complex
Whether the coefficients are complex- or real-valued.
- Type:
bool
References
[1]E P Simoncelli and W T Freeman, “The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation,” Second Int’l Conf on Image Processing, Washington, DC, Oct 1995.
[2]A Karasaridis and E P Simoncelli, “A Filter Design Technique for Steerable Pyramid Image Transforms”, ICASSP, Atlanta, GA, May 1996. .. _webpage: https://www.cns.nyu.edu/~eero/steerpyr/
Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.convert_pyr_to_tensor
(pyr_coeffs[, ...])Convert coefficient dictionary to a tensor.
convert_tensor_to_pyr
(pyr_tensor, ...)Convert pyramid coefficient tensor to dictionary format.
cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(x[, scales])Generate the steerable pyramid coefficients for an image
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
recon_pyr
(pyr_coeffs[, levels, bands])Reconstruct the image or batch of images, optionally using subset of pyramid coefficients.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post hook to be run after module's
load_state_dict
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_pre_hook
(hook)Register a pre-hook for the
load_state_dict()
method.requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
steer_coeffs
(pyr_coeffs, angles[, even_phase])Steer pyramid coefficients to the specified angles
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- static convert_pyr_to_tensor(pyr_coeffs, split_complex=False)[source]
Convert coefficient dictionary to a tensor.
The output tensor has shape (batch, channel, height, width) and is intended to be used in an
torch.nn.Module
downstream. In the multichannel case, all bands for each channel will be stacked together (i.e. if there are 2 channels and 18 bands per channel, pyr_tensor[:,0:18,…] will contain the pyr responses for channel 1 and pyr_tensor[:, 18:36, …] will contain the responses for channel 2). In the case of a complex, multichannel pyramid with split_complex=True, the real/imaginary bands will be intereleaved so that they appear as pairs with neighboring indices in the channel dimension of the tensor (Note: the residual bands are always real so they will only ever have a single band even when split_complex=True.)This only works if
pyr_coeffs
was created with a pyramid withdownsample=False
- Parameters:
pyr_coeffs (
OrderedDict
) – the pyramid coefficientssplit_complex (
bool
) – indicates whether the output should split complex bands into real/imag channels or keep them as a single channel. This should be True if you intend to use a convolutional layer on top of the output.
- Return type:
Tuple
[Tensor
,Tuple
[int
,bool
,List
[Union
[Tuple
[int
,int
],Literal
['residual_lowpass'
,'residual_highpass'
]]]]]- Returns:
pyr_tensor – shape (batch, channel, height, width). pyramid coefficients reshaped into tensor. The first channel will be the residual highpass and the last will be the residual lowpass. Each band is then a separate channel.
pyr_info – Information required to recreate the dictionary, containing the number of channels, if split_complex was used in this function call, and the list of pyramid keys for the dictionary
See also
convert_tensor_to_pyr
Convert tensor representation to pyramid dictionary.
- static convert_tensor_to_pyr(pyr_tensor, num_channels, split_complex, pyr_keys)[source]
Convert pyramid coefficient tensor to dictionary format.
num_channels
,split_complex
, andpyr_keys
are elements of thepyr_info
tuple returned byconvert_pyr_to_tensor
. You should always unpack the arguments for this function from thatpyr_info
tuple. Example Usage:pyr_tensor, pyr_info = convert_pyr_to_tensor(pyr_coeffs, split_complex=True) pyr_dict = convert_tensor_to_pyr(pyr_tensor, *pyr_info)
- Parameters:
pyr_tensor (
Tensor
) – Shape (batch, channel, height, width). The pyramid coefficientsnum_channels (
int
) – number of channels in the original input tensor the pyramid was created for (i.e. if the input was an RGB image, this would be 3)split_complex (
bool
) – true or false, specifying whether the pyr_tensor was created with complex channels split or not (if the pyramid was a complex pyramid).pyr_keys (
List
[Union
[Tuple
[int
,int
],Literal
['residual_lowpass'
,'residual_highpass'
]]]) – tuple containing the list of keys for the original pyramid dictionary
- Returns:
pyramid coefficients in dictionary format
- Return type:
pyr_coeffs
See also
convert_pyr_to_tensor
Convert pyramid dictionary representation to tensor.
- forward(x, scales=None)[source]
Generate the steerable pyramid coefficients for an image
- Parameters:
x (
Tensor
) – A tensor containing the image to analyze. We want to operate on this in the pytorch-y way, so we want it to be 4d (batch, channel, height, width).scales (
Optional
[List
[Union
[int
,Literal
['residual_lowpass'
,'residual_highpass'
]]]]) – Which scales to include in the returned representation. If None, we include all scales. Otherwise, can contain subset of values present in this model’sscales
attribute (ints from 0 up toself.num_scales-1
and the strs ‘residual_highpass’ and ‘residual_lowpass’. Can contain a single value or multiple values. If it’s an int, we include all orientations from that scale. Order within the list does not matter.
- Returns:
Pyramid coefficients
- Return type:
representation
- recon_pyr(pyr_coeffs, levels='all', bands='all')[source]
Reconstruct the image or batch of images, optionally using subset of pyramid coefficients.
NOTE: in order to call this function, you need to have previously called self.forward(x), where x is the tensor you wish to reconstruct. This will fail if you called forward() with a subset of scales.
- Parameters:
pyr_coeffs (
OrderedDict
) – pyramid coefficients to reconstruct fromlevels (
Union
[Literal
['all'
],List
[Union
[int
,Literal
['residual_lowpass'
,'residual_highpass'
]]]]) – If list should contain some subset of integers from 0 to self.num_scales-1 (inclusive), ‘residual_lowpass’, and ‘residual_highpass’. If ‘all’, returned value will contain all valid levels. Otherwise, must be one of the valid levels.bands (
Union
[Literal
['all'
],List
[int
]]) – If list, should contain some subset of integers from 0 to self.num_orientations-1. If ‘all’, returned value will contain all valid orientations. Otherwise, must be one of the valid orientations.
- Returns:
The reconstructed image, of shape (batch, channel, height, width)
- Return type:
recon
- steer_coeffs(pyr_coeffs, angles, even_phase=True)[source]
Steer pyramid coefficients to the specified angles
This allows you to have filters that have the Gaussian derivative order specified in construction, but arbitrary angles or number of orientations.
- Parameters:
pyr_coeffs (
OrderedDict
) – the pyramid coefficients to steerangles (
List
[float
]) – list of angles (in radians) to steer the pyramid coefficients toeven_phase (
bool
) – specifies whether the harmonics are cosine or sine phase aligned about those positions.
- Return type:
Tuple
[dict
,dict
]- Returns:
resteered_coeffs – dictionary of re-steered pyramid coefficients. will have the same number of scales as the original pyramid (though it will not contain the residual highpass or lowpass). like pyr_coeffs, keys are 2-tuples of ints indexing the scale and orientation, but now we’re indexing angles instead of self.num_orientations.
resteering_weights – dictionary of weights used to re-steer the pyramid coefficients. will have the same keys as resteered_coeffs.
Module contents
plenoptic.simulate.models package
Submodules
plenoptic.simulate.models.frontend module
Model architectures in this file are found in [1], [2]. frontend.OnOff() has optional pretrained filters that were reverse-engineered from a previously-trained model and should be used at your own discretion.
References
A Berardino, J Ballé, V Laparra, EP Simoncelli, Eigen-distortions of hierarchical representations, NeurIPS 2017; https://arxiv.org/abs/1710.02266
- class plenoptic.simulate.models.frontend.LinearNonlinear(kernel_size, on_center=True, width_ratio_limit=4.0, amplitude_ratio=1.25, pad_mode='reflect', activation=<built-in function softplus>)[source]
Bases:
Module
Linear-Nonlinear model, applies a difference of Gaussians filter followed by an activation function. Model is described in [1] and [2].
- Parameters:
kernel_size (
Union
[int
,Tuple
[int
,int
]]) – Shape of convolutional kernel.on_center (
bool
) – Dictates whether center is on or off; surround will be the opposite of center (i.e. on-off or off-on).width_ratio_limit (
float
) – Sets a lower bound on the ratio of surround_std over center_std. The surround Gaussian must be wider than the center Gaussian in order to be a proper Difference of Gaussians. surround_std will be clamped to ratio_limit times center_std.amplitude_ratio (
float
) – Ratio of center/surround amplitude. Applied before filter normalization.pad_mode (
str
) – Padding for convolution, defaults to “reflect”.activation (
Callable
[[Tensor
],Tensor
]) – Activation function following linear convolution.
- center_surround
CenterSurround difference of Gaussians filter.
- Type:
nn.Module
References
[1]A Berardino, J Ballé, V Laparra, EP Simoncelli, Eigen-distortions of hierarchical representations, NeurIPS 2017; https://arxiv.org/abs/1710.02266
Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
display_filters
([zoom])Displays convolutional filters of model
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(x)Define the computation performed at every call.
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post hook to be run after module's
load_state_dict
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_pre_hook
(hook)Register a pre-hook for the
load_state_dict()
method.requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- display_filters(zoom=5.0, **kwargs)[source]
Displays convolutional filters of model
- Parameters:
zoom (float) – Magnification factor for po.imshow()
**kwargs – Keyword args for po.imshow
- Returns:
fig
- Return type:
PyrFigure
- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
Tensor
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class plenoptic.simulate.models.frontend.LuminanceContrastGainControl(kernel_size, on_center=True, width_ratio_limit=4.0, amplitude_ratio=1.25, pad_mode='reflect', activation=<built-in function softplus>)[source]
Bases:
Module
Linear center-surround followed by luminance and contrast gain control, and activation function. Model is described in [1] and [2].
- Parameters:
kernel_size (
Union
[int
,Tuple
[int
,int
]]) – Shape of convolutional kernel.on_center (
bool
) – Dictates whether center is on or off; surround will be the opposite of center (i.e. on-off or off-on).width_ratio_limit (
float
) – Sets a lower bound on the ratio of surround_std over center_std. The surround Gaussian must be wider than the center Gaussian in order to be a proper Difference of Gaussians. surround_std will be clamped to ratio_limit times center_std.amplitude_ratio (
float
) – Ratio of center/surround amplitude. Applied before filter normalization.pad_mode (
str
) – Padding for convolution, defaults to “reflect”.activation (
Callable
[[Tensor
],Tensor
]) – Activation function following linear convolution.
- center_surround
Difference of Gaussians linear filter.
- Type:
nn.Module
- luminance
Gaussian convolutional kernel used to normalize signal by local luminance.
- Type:
nn.Module
- contrast
Gaussian convolutional kernel used to normalize signal by local contrast.
- Type:
nn.Module
- luminance_scalar
Scale factor for luminance normalization.
- Type:
nn.Parameter
- contrast_scalar
Scale factor for contrast normalization.
- Type:
nn.Parameter
References
[1]A Berardino, J Ballé, V Laparra, EP Simoncelli, Eigen-distortions of hierarchical representations, NeurIPS 2017; https://arxiv.org/abs/1710.02266
Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
display_filters
([zoom])Displays convolutional filters of model
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(x)Define the computation performed at every call.
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post hook to be run after module's
load_state_dict
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_pre_hook
(hook)Register a pre-hook for the
load_state_dict()
method.requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- display_filters(zoom=5.0, **kwargs)[source]
Displays convolutional filters of model
- Parameters:
zoom (float) – Magnification factor for po.imshow()
**kwargs – Keyword args for po.imshow
- Returns:
fig
- Return type:
PyrFigure
- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
Tensor
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class plenoptic.simulate.models.frontend.LuminanceGainControl(kernel_size, on_center=True, width_ratio_limit=4.0, amplitude_ratio=1.25, pad_mode='reflect', activation=<built-in function softplus>)[source]
Bases:
Module
Linear center-surround followed by luminance gain control and activation. Model is described in [1] and [2].
- Parameters:
kernel_size (
Union
[int
,Tuple
[int
,int
]]) – Shape of convolutional kernel.on_center (
bool
) – Dictates whether center is on or off; surround will be the opposite of center (i.e. on-off or off-on).width_ratio_limit (
float
) – Sets a lower bound on the ratio of surround_std over center_std. The surround Gaussian must be wider than the center Gaussian in order to be a proper Difference of Gaussians. surround_std will be clamped to ratio_limit times center_std.amplitude_ratio (
float
) – Ratio of center/surround amplitude. Applied before filter normalization.pad_mode (
str
) – Padding for convolution, defaults to “reflect”.activation (
Callable
[[Tensor
],Tensor
]) – Activation function following linear convolution.
- center_surround
Difference of Gaussians linear filter.
- Type:
nn.Module
- luminance
Gaussian convolutional kernel used to normalize signal by local luminance.
- Type:
nn.Module
- luminance_scalar
Scale factor for luminance normalization.
- Type:
nn.Parameter
References
[1]A Berardino, J Ballé, V Laparra, EP Simoncelli, Eigen-distortions of hierarchical representations, NeurIPS 2017; https://arxiv.org/abs/1710.02266
Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
display_filters
([zoom])Displays convolutional filters of model
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(x)Define the computation performed at every call.
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post hook to be run after module's
load_state_dict
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_pre_hook
(hook)Register a pre-hook for the
load_state_dict()
method.requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- display_filters(zoom=5.0, **kwargs)[source]
Displays convolutional filters of model
- Parameters:
zoom (float) – Magnification factor for po.imshow()
**kwargs – Keyword args for po.imshow
- Returns:
fig
- Return type:
PyrFigure
- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
Tensor
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class plenoptic.simulate.models.frontend.OnOff(kernel_size, width_ratio_limit=4.0, amplitude_ratio=1.25, pad_mode='reflect', pretrained=False, activation=<built-in function softplus>, apply_mask=False, cache_filt=False)[source]
Bases:
Module
Two-channel on-off and off-on center-surround model with local contrast and luminance gain control.
This model is called OnOff in Berardino et al 2017.
- Parameters:
kernel_size (
Union
[int
,Tuple
[int
,int
]]) – Shape of convolutional kernel.width_ratio_limit (
float
) – Sets a lower bound on the ratio of surround_std over center_std. The surround Gaussian must be wider than the center Gaussian in order to be a proper Difference of Gaussians. surround_std will be clamped to ratio_limit times center_std.amplitude_ratio (
float
) – Ratio of center/surround amplitude. Applied before filter normalization.pad_mode (
str
) – Padding for convolution, defaults to “reflect”.pretrained – Whether or not to load model params estimated from [1]. See Notes for details.
activation (
Callable
[[Tensor
],Tensor
]) – Activation function following linear and gain control operations.apply_mask (
bool
) – Whether or not to apply circular disk mask centered on the input image. This is useful for synthesis methods like Eigendistortions to ensure that the synthesized distortion will not appear in the periphery. See plenoptic.tools.signal.make_disk() for details on how mask is created.cache_filt (
bool
) – Whether or not to cache the filter. Avoids regenerating filt with each forward pass. Cached to self._filt.
Notes
These 12 parameters (standard deviations & scalar constants) were reverse-engineered from model from [1], [2]. Please use these pretrained weights at your own discretion.
References
[1] (1,2)A Berardino, J Ballé, V Laparra, EP Simoncelli, Eigen-distortions of hierarchical representations, NeurIPS 2017; https://arxiv.org/abs/1710.02266
Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
display_filters
([zoom])Displays convolutional filters of model
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(x)Define the computation performed at every call.
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post hook to be run after module's
load_state_dict
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_pre_hook
(hook)Register a pre-hook for the
load_state_dict()
method.requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- display_filters(zoom=5.0, **kwargs)[source]
Displays convolutional filters of model
- Parameters:
zoom (float) – Magnification factor for po.imshow()
**kwargs – Keyword args for po.imshow
- Returns:
fig
- Return type:
PyrFigure
- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
Tensor
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
plenoptic.simulate.models.naive module
- class plenoptic.simulate.models.naive.CenterSurround(kernel_size, on_center=True, width_ratio_limit=2.0, amplitude_ratio=1.25, center_std=1.0, surround_std=4.0, out_channels=1, pad_mode='reflect', cache_filt=False)[source]
Bases:
Module
Center-Surround, Difference of Gaussians (DoG) filter model. Can be either on-center/off-surround, or vice versa.
Filter is constructed as: .. math:
f &= amplitude_ratio * center - surround \ f &= f/f.sum()
The signs of center and surround are determined by center argument. The standard deviation of the surround Gaussian is constrained to be at least width_ratio_limit times that of the center Gaussian.
- Parameters:
kernel_size (
Union
[int
,Tuple
[int
,int
]]) – Shape of convolutional kernel.on_center (
Union
[bool
,List
[bool
]]) – Dictates whether center is on or off; surround will be the opposite of center (i.e. on-off or off-on). If List of bools, then list length must equal out_channels, if just a single bool, then all out_channels will be assumed to be all on-off or off-on.width_ratio_limit (
float
) – Sets a lower bound on the ratio of surround_std over center_std. The surround Gaussian must be wider than the center Gaussian in order to be a proper Difference of Gaussians. surround_std will be clamped to ratio_limit times center_std.amplitude_ratio (
float
) – Ratio of center/surround amplitude. Applied before filter normalization.center_std (
Union
[float
,Tensor
]) – Standard deviation of circular Gaussian for center.surround_std (
Union
[float
,Tensor
]) – Standard deviation of circular Gaussian for surround. Must be at least ratio_limit times center_std.out_channels (
int
) – Number of filters.pad_mode (
str
) – Padding for convolution, defaults to “circular”.cache_filt (
bool
) – Whether or not to cache the filter. Avoids regenerating filt with each forward pass. Cached to self._filt
- Attributes:
filt
Creates an on center/off surround, or off center/on surround conv filter
Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(x)Define the computation performed at every call.
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post hook to be run after module's
load_state_dict
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_pre_hook
(hook)Register a pre-hook for the
load_state_dict()
method.requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- property filt: Tensor
Creates an on center/off surround, or off center/on surround conv filter
- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
Tensor
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class plenoptic.simulate.models.naive.Gaussian(kernel_size, std=3.0, pad_mode='reflect', out_channels=1, cache_filt=False)[source]
Bases:
Module
Isotropic Gaussian convolutional filter. Kernel elements are normalized and sum to one.
- Parameters:
kernel_size (
Union
[int
,Tuple
[int
,int
]]) – Size of convolutional kernel.std (
Union
[float
,Tensor
]) – Standard deviation of circularly symmtric Gaussian kernel.pad_mode (
str
) – Padding mode argument to pass to torch.nn.functional.pad.out_channels (
int
) – Number of filters with which to convolve.cache_filt (
bool
) – Whether or not to cache the filter. Avoids regenerating filt with each forward pass. Cached to self._filt.
- Attributes:
- filt
Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(x, **conv2d_kwargs)Define the computation performed at every call.
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post hook to be run after module's
load_state_dict
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_pre_hook
(hook)Register a pre-hook for the
load_state_dict()
method.requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- property filt
- forward(x, **conv2d_kwargs)[source]
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
Tensor
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class plenoptic.simulate.models.naive.Identity(name=None)[source]
Bases:
Module
simple class that just returns a copy of the image
We use this as a “dummy model” for metrics that we don’t have the representation for. We use this as the model and then just change the objective function.
Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(img)Return a copy of the image
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post hook to be run after module's
load_state_dict
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_pre_hook
(hook)Register a pre-hook for the
load_state_dict()
method.requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- class plenoptic.simulate.models.naive.Linear(kernel_size=(3, 3), pad_mode='circular', default_filters=True)[source]
Bases:
Module
Simplistic linear convolutional model: It splits the input greyscale image into low and high frequencies.
- Parameters:
kernel_size (
Union
[int
,Tuple
[int
,int
]]) – Convolutional kernel size.pad_mode (
str
) – Mode with which to pad image using nn.functional.pad().default_filters (
bool
) – Initialize the filters to a low-pass and a band-pass.
Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(x)Define the computation performed at every call.
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post hook to be run after module's
load_state_dict
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_pre_hook
(hook)Register a pre-hook for the
load_state_dict()
method.requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
Tensor
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
plenoptic.simulate.models.portilla_simoncelli module
Portilla-Simoncelli texture statistics.
The Portilla-Simoncelli (PS) texture statistics are a set of image statistics, first described in [1]_, that are proposed as a sufficient set of measurements for describing visual textures. That is, if two texture images have the same values for all PS texture stats, humans should consider them as members of the same family of textures.
- class plenoptic.simulate.models.portilla_simoncelli.PortillaSimoncelli(image_shape, n_scales=4, n_orientations=4, spatial_corr_width=9)[source]
Bases:
Module
Portila-Simoncelli texture statistics.
The Portilla-Simoncelli (PS) texture statistics are a set of image statistics, first described in [1], that are proposed as a sufficient set of measurements for describing visual textures. That is, if two texture images have the same values for all PS texture stats, humans should consider them as members of the same family of textures.
The PS stats are computed based on the steerable pyramid [2]. They consist of the local auto-correlations, cross-scale (within-orientation) correlations, and cross-orientation (within-scale) correlations of both the pyramid coefficients and the local energy (as computed by those coefficients). Additionally, they include the first four global moments (mean, variance, skew, and kurtosis) of the image and down-sampled versions of that image. See the paper and notebook for more description.
- Parameters:
image_shape (
Tuple
[int
,int
]) – Shape of input image.n_scales (
int
) – The number of pyramid scales used to measure the statistics (default=4)n_orientations (
int
) – The number of orientations used to measure the statistics (default=4)spatial_corr_width (
int
) – The width of the spatial cross- and auto-correlation statistics
- scales
The names of the unique scales of coefficients in the pyramid, used for coarse-to-fine metamer synthesis.
- Type:
list
References
[1]J Portilla and E P Simoncelli. A Parametric Texture Model based on Joint Statistics of Complex Wavelet Coefficients. Int’l Journal of Computer Vision. 40(1):49-71, October, 2000. http://www.cns.nyu.edu/~eero/ABSTRACTS/portilla99-abstract.html http://www.cns.nyu.edu/~lcv/texture/
[2]E P Simoncelli and W T Freeman, “The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation,” Second Int’l Conf on Image Processing, Washington, DC, Oct 1995.
Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.convert_to_dict
(representation_tensor)Convert tensor of statistics to a dictionary.
convert_to_tensor
(representation_dict)Convert dictionary of statistics to a tensor.
cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(image[, scales])Generate Texture Statistics representation of an image.
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
plot_representation
(data[, ax, figsize, ...])Plot the representation in a human viewable format -- stem plots with data separated out by statistic type.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post hook to be run after module's
load_state_dict
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_pre_hook
(hook)Register a pre-hook for the
load_state_dict()
method.remove_scales
(representation_tensor, ...)Remove statistics not associated with scales.
requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.update_plot
(axes, data[, batch_idx])Update the information in our representation plot.
xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- convert_to_dict(representation_tensor)[source]
Convert tensor of statistics to a dictionary.
While the tensor representation is required by plenoptic’s synthesis objects, the dictionary representation is easier to manually inspect.
This dictionary will contain NaNs in its values: these are placeholders for the redundant statistics.
- Parameters:
representation_tensor (
Tensor
) – 3d tensor of statistics.- Returns:
Dictionary of representation, with informative keys.
- Return type:
rep
See also
convert_to_tensor
Convert dictionary representation to tensor.
- convert_to_tensor(representation_dict)[source]
Convert dictionary of statistics to a tensor.
- Parameters:
representation_dict (
OrderedDict
) – Dictionary of representation.- Return type:
3d tensor of statistics.
See also
convert_to_dict
Convert tensor representation to dictionary.
- forward(image, scales=None)[source]
Generate Texture Statistics representation of an image.
Note that separate batches and channels are analyzed in parallel.
- Parameters:
image (
Tensor
) – A 4d tensor (batch, channel, height, width) containing the image(s) to analyze.scales (
Optional
[List
[Union
[Literal
['pixel_statistics'
],int
,Literal
['residual_lowpass'
,'residual_highpass'
]]]]) – Which scales to include in the returned representation. If None, we include all scales. Otherwise, can contain subset of values present in this model’sscales
attribute, and the returned tensor will then contain the subset corresponding to those scales.
- Returns:
3d tensor of shape (batch, channel, stats) containing the measured texture statistics.
- Return type:
representation_tensor
- Raises:
ValueError : – If image is not 4d or has a dtype other than float or complex.
- plot_representation(data, ax=None, figsize=(15, 15), ylim=None, batch_idx=0, title=None)[source]
Plot the representation in a human viewable format – stem plots with data separated out by statistic type.
This plots the representation of a single batch and averages over all channels in the representation.
We create the following axes:
pixels+var_highpass: marginal pixel statistics (first four moments, min, max) and variance of the residual highpass.
std+skew+kurtosis recon: the standard deviation, skew, and kurtosis of the reconstructed lowpass image at each scale
magnitude_std: the standard deviation of the steerable pyramid coefficient magnitudes at each orientation and scale.
auto_correlation_reconstructed: the auto-correlation of the reconstructed lowpass image at each scale (summarized using Euclidean norm).
auto_correlation_magnitude: the auto-correlation of the pyramid coefficient magnitudes at each scale and orientation (summarized using Euclidean norm).
cross_orientation_correlation_magnitude: the cross-correlations between each orientation at each scale (summarized using Euclidean norm)
If self.n_scales > 1, we also have:
cross_scale_correlation_magnitude: the cross-correlations between the pyramid coefficient magnitude at one scale and the same orientation at the next-coarsest scale (summarized using Euclidean norm).
cross_scale_correlation_real: the cross-correlations between the real component of the pyramid coefficients and the real and imaginary components (at the same orientation) at the next-coarsest scale (summarized using Euclidean norm).
- Parameters:
data (
Tensor
) – The data to show on the plot. Else, should look like the output ofself.forward(img)
, with the exact same structure (e.g., as returned bymetamer.representation_error()
or another instance of this class).ax (
Optional
[Axes
]) – Axes where we will plot the data. If aplt.Axes
instance, will subdivide into 6 or 8 new axes (depending on self.n_scales). If None, we create a new figure.figsize (
Tuple
[float
,float
]) – The size of the figure. Ignored if ax is not None.ylim (
Union
[Tuple
[float
,float
],Literal
[False
],None
]) – If not None, the y-limits to use for this plot. If None, we use the default, slightly adjusted so that the minimum is 0. If False, do not change y-limits.batch_idx (
int
) – Which index to take from the batch dimension (the first one)title (string) – Title for the plot
- Return type:
Tuple
[Figure
,List
[Axes
]]- Returns:
fig – Figure containing the plot
axes – List of 6 or 8 axes containing the plot (depending on self.n_scales)
- remove_scales(representation_tensor, scales_to_keep)[source]
Remove statistics not associated with scales.
For a given representation_tensor and a list of scales_to_keep, this attribute removes all statistics not associated with those scales.
Note that calling this method will always remove statistics.
- Parameters:
representation_tensor (
Tensor
) – 3d tensor containing the measured representation statistics.scales_to_keep (
List
[Union
[Literal
['pixel_statistics'
],int
,Literal
['residual_lowpass'
,'residual_highpass'
]]]) – Which scales to include in the returned representation. Can contain subset of values present in this model’sscales
attribute, and the returned tensor will then contain the subset of the full representation corresponding to those scales.
- Returns:
Representation tensor with some statistics removed.
- Return type:
limited_representation_tensor
- update_plot(axes, data, batch_idx=0)[source]
Update the information in our representation plot.
This is used for creating an animation of the representation over time. In order to create the animation, we need to know how to update the matplotlib Artists, and this provides a simple way of doing that. It relies on the fact that we’ve used
plot_representation
to create the plots we want to update and so know that they’re stem plots.We take the axes containing the representation information (note that this is probably a subset of the total number of axes in the figure, if we’re showing other information, as done by
Metamer.animate
), grab the representation from plotting and, since these are both lists, iterate through them, updating them to the values indata
as we go.In order for this to be used by
FuncAnimation
, we need to return Artists, so we return a list of the relevant artists, themarkerline
andstemlines
from theStemContainer
.Currently, this averages over all channels in the representation.
- Parameters:
axes (
List
[Axes
]) – A list of axes to update. We assume that these are the axes created byplot_representation
and so contain stem plots in the correct order.batch_idx (
int
) – Which index to take from the batch dimension (the first one)data (
Tensor
) – The data to show on the plot. Else, should look like the output ofself.forward(img)
, with the exact same structure (e.g., as returned bymetamer.representation_error()
or another instance of this class).
- Returns:
A list of the artists used to update the information on the stem plots
- Return type:
stem_artists
Module contents
Module contents
plenoptic.synthesize package
Submodules
plenoptic.synthesize.autodiff module
- plenoptic.synthesize.autodiff.jacobian(y, x)[source]
Explicitly compute the full Jacobian matrix. N.B. This is only recommended for small input sizes (e.g. <100x100 image)
- Parameters:
y (
Tensor
) – Model output with gradient attachedx (
Tensor
) – Tensor with gradient function model input with gradient attached
- Returns:
Jacobian matrix with
torch.Size([len(y), len(x)])
- Return type:
J
- plenoptic.synthesize.autodiff.jacobian_vector_product(y, x, V, dummy_vec=None)[source]
Compute Jacobian Vector Product: \(\text{jvp} = (\partial y/\partial x) v\)
Forward Mode Auto-Differentiation (
Rop
in Theano). PyTorch does not natively support this operation; this function essentially calls backward mode autodiff twice, as described in [1].See
vector_jacobian_product()
docstring on why we and pass arguments forretain_graph
andcreate_graph
.- Parameters:
y (
Tensor
) – Model output with gradient attached, shape is torch.Size([m, 1])x (
Tensor
) – Model input with gradient attached, shape is torch.Size([n, 1]), i.e. same dim as input tensorV (
Tensor
) – Directions in which to compute product, shape is torch.Size([n, k]) where k is number of vectors to computedummy_vec (
Tensor
) – Vector with which to do jvp trick [1]. If argument exists, then use some pre-allocated, cached vector, otherwise create a new one and move to device in this method.
- Returns:
Jacobian-vector product, torch.Size([n, k])
- Return type:
Jv
Notes
- plenoptic.synthesize.autodiff.vector_jacobian_product(y, x, U, retain_graph=True, create_graph=True, detach=False)[source]
Compute vector Jacobian product: \(\text{vjp} = u^T(\partial y/\partial x)\)
Backward Mode Auto-Differentiation (Lop in Theano)
Note on efficiency: When this function is used in the context of power iteration for computing eigenvectors, the vector output will be repeatedly fed back into
vector_jacobian_product()
andjacobian_vector_product()
. To prevent the accumulation of gradient history in this vector (especially on GPU), we need to ensure the computation graph is not kept in memory after each iteration. We can do this by detaching the output, as well as carefully specifying where/when to retain the created graph.- Parameters:
y (
Tensor
) – Output with gradient attached,torch.Size([m, 1])
.x (
Tensor
) – Input with gradient attached,torch.Size([n, 1])
.U (
Tensor
) – Direction, shape istorch.Size([m, k])
, i.e. same dim as output tensor.retain_graph (
bool
) – Whether or not to keep graph after doing onevector_jacobian_product()
. Must be set to True if k>1.create_graph (
bool
) – Whether or not to create computational graph. Usually should be set to True unless you’re reusing the graph like in the second step ofjacobian_vector_product()
.detach (
bool
) – As withcreate_graph
, only necessary to be True when reusing the output like we do in the 2nd step ofjacobian_vector_product()
.
- Returns:
vector-Jacobian product,
torch.Size([m, k])
.- Return type:
vJ
plenoptic.synthesize.eigendistortion module
- class plenoptic.synthesize.eigendistortion.Eigendistortion(image, model)[source]
Bases:
Synthesis
Synthesis object to compute eigendistortions induced by a model on a given input image.
- Parameters:
image (
Tensor
) – Image, torch.Size(batch=1, channel, height, width). We currently do not support batches of images, as each image requires its own optimization.model (
Module
) – Torch model with defined forward and backward operations.
- batch_size
- Type:
int
- n_channels
- Type:
int
- im_height
- Type:
int
- im_width
- Type:
int
- jacobian
Is only set when
synthesize()
is run withmethod='exact'
. Default toNone
.- Type:
Tensor
- eigendistortions
Tensor of eigendistortions (eigenvectors of Fisher matrix), ordered by eigenvalue, with Size((n_distortions, n_channels, im_height, im_width)).
- Type:
Tensor
- eigenvalues
Tensor of eigenvalues corresponding to each eigendistortion, listed in decreasing order.
- Type:
Tensor
- eigenindex
Index of each eigenvector/eigenvalue.
- Type:
listlike
Notes
This is a method for comparing image representations in terms of their ability to explain perceptual sensitivity in humans. It estimates eigenvectors of the FIM. A model, \(y = f(x)\), is a deterministic (and differentiable) mapping from the input pixels \(x \in \mathbb{R}^n\) to a mean output response vector \(y\in \mathbb{ R}^m\), where we assume additive white Gaussian noise in the response space. The Jacobian matrix at x is:
\(J(x) = J = dydx\), \(J\in\mathbb{R}^{m \times n}\) (ie. output_dim x input_dim)
is the matrix of all first-order partial derivatives of the vector-valued function f. The Fisher Information Matrix (FIM) at x, under white Gaussian noise in the response space, is:
\(F = J^T J\)
It is a quadratic approximation of the discriminability of distortions relative to \(x\).
References
[1]Berardino, A., Laparra, V., Ballé, J. and Simoncelli, E., 2017. Eigen-distortions of hierarchical representations. In Advances in neural information processing systems (pp. 3530-3539). http://www.cns.nyu.edu/pub/lcv/berardino17c-final.pdf http://www.cns.nyu.edu/~lcv/eigendistortions/
- Attributes:
eigendistortions
Tensor of eigendistortions (eigenvectors of Fisher matrix), ordered by eigenvalue.
eigenindex
Index of each eigenvector/eigenvalue.
eigenvalues
Tensor of eigenvalues corresponding to each eigendistortion, listed in decreasing order.
- image
jacobian
Is only set when
synthesize()
is run withmethod='exact'
.- model
Methods
Calls autodiff.jacobian and returns jacobian.
load
(file_path[, map_location])Load all relevant stuff from a .pt file.
save
(file_path)Save all relevant variables in .pt file.
synthesize
([method, k, max_iter, p, q, ...])Compute eigendistortions of Fisher Information Matrix with given input image.
to
(*args, **kwargs)Moves and/or casts the parameters and buffers.
- compute_jacobian()[source]
Calls autodiff.jacobian and returns jacobian. Will throw error if input too big.
- Returns:
Jacobian of representation wrt input.
- Return type:
J
- property eigendistortions
Tensor of eigendistortions (eigenvectors of Fisher matrix), ordered by eigenvalue.
- property eigenindex
Index of each eigenvector/eigenvalue.
- property eigenvalues
Tensor of eigenvalues corresponding to each eigendistortion, listed in decreasing order.
- property image
- property jacobian
Is only set when
synthesize()
is run withmethod='exact'
. Default toNone
.
- load(file_path, map_location=None, **pickle_load_args)[source]
Load all relevant stuff from a .pt file.
This should be called by an initialized
Eigendistortion
object – we will ensure thatimage
andmodel
are identical.Note this operates in place and so doesn’t return anything.
- Parameters:
file_path (str) – The path to load the synthesis object from
map_location (str, optional) – map_location argument to pass to
torch.load
. If you save stuff that was being run on a GPU and are loading onto a CPU, you’ll need this to make sure everything lines up properly. This should be structured like the str you would pass totorch.device
pickle_load_args – any additional kwargs will be added to
pickle_module.load
viatorch.load
, see that function’s docstring for details.
Examples
>>> geo = po.synth.Geodesic(img_a, img_b, model) >>> geo.synthesize(max_iter=10, store_progress=True) >>> geo.save('geo.pt') >>> geo_copy = po.synth.Geodesic(img_a, img_b, model) >>> geo_copy.load('geo.pt')
Note that you must create a new instance of the Synthesis object and then load.
- property model
- save(file_path)[source]
Save all relevant variables in .pt file.
See
load
docstring for an example of use.- Parameters:
file_path (str) – The path to save the Eigendistortion object to
- synthesize(method='power', k=1, max_iter=1000, p=5, q=2, stop_criterion=1e-07)[source]
Compute eigendistortions of Fisher Information Matrix with given input image.
- Parameters:
method (
Literal
['exact'
,'power'
,'randomized_svd'
]) – Eigensolver method. ‘exact’ tries to do eigendecomposition directly ( not recommended for very large inputs). ‘power’ (default) uses the power method to compute first and last eigendistortions, with maximum number of iterations dictated by n_steps. ‘randomized_svd’ uses randomized SVD to approximate the top k eigendistortions and their corresponding eigenvalues.k (
int
) – How many vectors to return using block power method or svd.max_iter (
int
) – Maximum number of steps to run formethod='power'
in eigenvalue computation. Ignored for other methods.p (
int
) – Oversampling parameter for randomized SVD. k+p vectors will be sampled, and k will be returned. See docstring of_synthesize_randomized_svd
for more details including algorithm reference.q (
int
) – Matrix power parameter for randomized SVD. This is an effective trick for the algorithm to converge to the correct eigenvectors when the eigenspectrum does not decay quickly. See_synthesize_randomized_svd
for more details including algorithm reference.stop_criterion (
float
) – Used ifmethod='power'
to check for convergence. If the L2-norm of the eigenvalues has changed by less than this value from one iteration to the next, we terminate synthesis.
- to(*args, **kwargs)[source]
Moves and/or casts the parameters and buffers.
This can be called as
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point desireddtype
s. In addition, this method will only cast the floating point parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point type of the floating point parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- device (
- plenoptic.synthesize.eigendistortion.display_eigendistortion(eigendistortion, eigenindex=0, alpha=5.0, process_image=<function <lambda>>, ax=None, plot_complex='rectangular', **kwargs)[source]
Displays specified eigendistortion added to the image.
If image or eigendistortions have 3 channels, then it is assumed to be a color image and it is converted to grayscale. This is merely for display convenience and may change in the future.
- Parameters:
eigendistortion (
Eigendistortion
) – Eigendistortion object whose synthesized eigendistortion we want to displayeigenindex (
int
) – Index of eigendistortion to plot. E.g. If there are 10 eigenvectors, 0 will index the first one, and -1 or 9 will index the last one.alpha (
float
) – Amount by which to scale eigendistortion for image + (alpha * eigendistortion) for display.process_image (
Callable
[[Tensor
],Tensor
]) – A function to process the image+alpha*distortion before clamping between 0,1. E.g. multiplying by the stdev ImageNet then adding the mean of ImageNet to undo image preprocessing.ax (
Optional
[axis
]) – Axis handle on which to plot.plot_complex (
str
) – Parameter forplenoptic.imshow()
determining how to handle complex values. Defaults to ‘rectangular’, which plots real and complex components as separate images. Can also be ‘polar’ or ‘logpolar’; see that method’s docstring for details.kwargs – Additional arguments for
po.imshow()
.
- Returns:
matplotlib Figure handle returned by plenoptic.imshow()
- Return type:
fig
- plenoptic.synthesize.eigendistortion.fisher_info_matrix_eigenvalue(y, x, v, dummy_vec=None)[source]
Compute the eigenvalues of the Fisher Information Matrix corresponding to eigenvectors in v \(\lambda= v^T F v\)
- Return type:
Tensor
- plenoptic.synthesize.eigendistortion.fisher_info_matrix_vector_product(y, x, v, dummy_vec)[source]
Compute Fisher Information Matrix Vector Product: \(Fv\)
- Parameters:
y (
Tensor
) – Output tensor with gradient attachedx (
Tensor
) – Input tensor with gradient attachedv (
Tensor
) – The vectors with which to compute Fisher vector productsdummy_vec (
Tensor
) – Dummy vector for Jacobian vector product trick
- Returns:
Vector, Fisher vector product
- Return type:
Fv
Notes
Under white Gaussian noise assumption, \(F\) is matrix multiplication of Jacobian transpose and Jacobian: \(F = J^T J\). Hence: \(Fv = J^T (Jv)\)
plenoptic.synthesize.geodesic module
- class plenoptic.synthesize.geodesic.Geodesic(image_a, image_b, model, n_steps=10, initial_sequence='straight', range_penalty_lambda=0.1, allowed_range=(0, 1))[source]
Bases:
OptimizedSynthesis
Synthesize an approximate geodesic between two images according to a model.
This method can be used to visualize and refine the invariances of a model’s representation as described in [1].
NOTE: This synthesis method is still under construction. It will run, but it might not find the most informative geodesic.
- Parameters:
image_a (
Tensor
) – Start and stop anchor points of the geodesic, of shape (1, channel, height, width).image_b (
Tensor
) – Start and stop anchor points of the geodesic, of shape (1, channel, height, width).model (
Module
) – an analysis model that computes representations on signals like image_a.n_steps (
int
) – the number of steps (i.e., transitions) in the trajectory between the two anchor points.initial_sequence (
Literal
['straight'
,'bridge'
]) – initialize the geodesic with pixel linear interpolation ('straight'
), or with a brownian bridge between the two anchors ('bridge'
).range_penalty_lambda (
float
) – strength of the regularizer that enforces the allowed_range. Must be non-negative.allowed_range (
Tuple
[float
,float
]) – Range (inclusive) of allowed pixel values. Any values outside this range will be penalized.
- geodesic
the synthesized sequence of images between the two anchor points that minimizes representation path energy, of shape
(n_steps+1, channel, height, width)
. It starts with image_a and ends with image_b.- Type:
Tensor
- pixelfade
the straight interpolation between the two anchor points, used as reference
- Type:
Tensor
- losses
A list of our loss over iterations.
- Type:
Tensor
- gradient_norm
A list of the gradient’s L2 norm over iterations.
- Type:
list
- pixel_change_norm
A list containing the L2 norm of the pixel change over iterations (
pixel_change_norm[i]
is the pixel change norm ingeodesic
between iterationsi
andi-1
).- Type:
list
- step_energy
step lengths in representation space, stored along the optimization process.
- Type:
Tensor
- dev_from_line
deviation of the representation to the straight line interpolation, measures distance from straight line and distance along straight line, stored along the optimization process
- Type:
Tensor
Notes
Manifold prior hypothesis: natural images form a manifold 𝑀ˣ embedded in signal space (ℝⁿ), a model warps this manifold to another manifold 𝑀ʸ embedded in representation space (ℝᵐ), and thereby induces a different local metric.
This method computes an approximate geodesics by solving an optimization problem: it minimizes the path energy (aka. action functional), which has the same minimum as minimizing path length and by Cauchy-Schwarz, reaches it with constant-speed minimizing geodesic
Caveat: depending on the geometry of the manifold, geodesics between two anchor points may not be unique and may depend on the initialization.
References
[1]Geodesics of learned representations O J Hénaff and E P Simoncelli Published in Int’l Conf on Learning Representations (ICLR), May 2016. http://www.cns.nyu.edu/~lcv/pubs/makeAbs.php?loc=Henaff16b
- Attributes:
- allowed_range
dev_from_line
Deviation of representation each from of
self.geodesic
from a straight line.- geodesic
gradient_norm
Synthesis gradient’s L2 norm over iterations.
- image_a
- image_b
losses
Synthesis loss over iterations.
- model
- optimizer
pixel_change_norm
L2 norm change in pixel values over iterations.
- range_penalty_lambda
step_energy
Squared L2 norm of transition between geodesic frames in representation space.
- store_progress
Methods
calculate_jerkiness
([geodesic])Compute the alignment of representation's acceleration to model local curvature.
load
(file_path[, map_location])Load all relevant stuff from a .pt file.
objective_function
([geodesic])Compute geodesic synthesis loss.
save
(file_path)Save all relevant variables in .pt file.
synthesize
([max_iter, optimizer, ...])Synthesize a geodesic via optimization.
to
(*args, **kwargs)Moves and/or casts the parameters and buffers.
- calculate_jerkiness(geodesic=None)[source]
Compute the alignment of representation’s acceleration to model local curvature.
This is the first order optimality condition for a geodesic, and can be used to assess the validity of the solution obtained by optimization.
- Parameters:
geodesic (
Optional
[Tensor
]) – Geodesic to check. If None, we useself.geodesic
. Must have a gradient attached.- Return type:
jerkiness
- property dev_from_line
Deviation of representation each from of
self.geodesic
from a straight line.Has shape
(np.ceil(synth_iter/store_progress), n_steps+1, 2)
, wheresynth_iter
is the number of iterations of synthesis that have happened. For final dimension, the first element is the Euclidean distance along the straight line and the second is the Euclidean distance to the line.
- property geodesic
- property image_a
- property image_b
- load(file_path, map_location=None, **pickle_load_args)[source]
Load all relevant stuff from a .pt file.
This should be called by an initialized
Geodesic
object – we will ensure thatimage_a
,image_b
,model
,n_steps
,initial_sequence
,range_penalty_lambda
,allowed_range
, andpixelfade
are all identical.Note this operates in place and so doesn’t return anything.
- Parameters:
file_path (str) – The path to load the synthesis object from
map_location (str, optional) – map_location argument to pass to
torch.load
. If you save stuff that was being run on a GPU and are loading onto a CPU, you’ll need this to make sure everything lines up properly. This should be structured like the str you would pass totorch.device
pickle_load_args – any additional kwargs will be added to
pickle_module.load
viatorch.load
, see that function’s docstring for details.
Examples
>>> geo = po.synth.Geodesic(img_a, img_b, model) >>> geo.synthesize(max_iter=10, store_progress=True) >>> geo.save('geo.pt') >>> geo_copy = po.synth.Geodesic(img_a, img_b, model) >>> geo_copy.load('geo.pt')
Note that you must create a new instance of the Synthesis object and then load.
- property model
- objective_function(geodesic=None)[source]
Compute geodesic synthesis loss.
This is the path energy (i.e., squared L2 norm of each step) of the geodesic’s model representation, with the weighted range penalty.
Additionally, caches:
self._geodesic_representation = self.model(geodesic)
self._most_recent_step_energy = self._calculate_step_energy(self._geodesic_representation)
These are cached because we might store them (if
self.store_progress is True
) and don’t want to recalcualte them- Parameters:
geodesic (
Optional
[Tensor
]) – Geodesic to check. If None, we useself.geodesic
.- Return type:
loss
- save(file_path)[source]
Save all relevant variables in .pt file.
See
load
docstring for an example of use.- Parameters:
file_path (str) – The path to save the Geodesic object to
- property step_energy
Squared L2 norm of transition between geodesic frames in representation space.
Has shape
(np.ceil(synth_iter/store_progress), n_steps)
, wheresynth_iter
is the number of iterations of synthesis that have happened.
- synthesize(max_iter=1000, optimizer=None, store_progress=False, stop_criterion=None, stop_iters_to_check=50)[source]
Synthesize a geodesic via optimization.
- Parameters:
max_iter (
int
) – The maximum number of iterations to run before we end synthesis (unless we hit the stop criterion).optimizer (
Optional
[Optimizer
]) – The optimizer to use. If None and this is the first time calling synthesize, we use Adam(lr=.001, amsgrad=True); if synthesize has been called before, this must be None and we reuse the previous optimizer.store_progress (
Union
[bool
,int
]) – Whether we should store the step energy and deviation of the representation from a straight line. If False, we don’t save anything. If True, we save every iteration. If an int, we save everystore_progress
iterations (note then that 0 is the same as False and 1 the same as True).stop_criterion (
Optional
[float
]) – If pixel_change_norm (i.e., the norm of the difference inself.geodesic
from one iteration to the next) over the paststop_iters_to_check
has been less thanstop_criterion
, we terminate synthesis. If None, we pick a default value based on the norm ofself.pixelfade
.stop_iters_to_check (
int
) – How many iterations back to check in order to see if pixel_change_norm has stopped decreasing (forstop_criterion
).
- to(*args, **kwargs)[source]
Moves and/or casts the parameters and buffers.
This can be called as
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point desireddtype
s. In addition, this method will only cast the floating point parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point type of the floating point parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- device (
- plenoptic.synthesize.geodesic.plot_deviation_from_line(geodesic, natural_video=None, ax=None)[source]
Visual diagnostic of geodesic linearity in representation space.
This plot illustrates the deviation from the straight line connecting the representations of a pair of images, for different paths in representation space.
- Parameters:
geodesic (
Geodesic
) – Geodesic object to visualize.natural_video (
Optional
[Tensor
]) – Natural video that bridges the anchor points, for comparison.ax (
Optional
[Axes
]) – If not None, the axis to plot this representation on. If None, we callplt.gca()
- Returns:
Axes containing the plot
- Return type:
ax
Notes
Axes are in the same units, normalized by the distance separating the end point representations.
Knots along each curve indicate samples used to compute the path.
When the representation is non-linear it may not be feasible for the geodesic to be straight (for example if the representation is normalized, all paths are constrained to live on a hypershpere). Nevertheless, if the representation is able to linearize the transformation between the anchor images, then we expect that both the ground truth natural video sequence and the geodesic will deviate from straight line similarly. By contrast the pixel-based interpolation will deviate significantly more from a straight line.
- plenoptic.synthesize.geodesic.plot_loss(geodesic, ax=None, **kwargs)[source]
Plot synthesis loss.
- Parameters:
geodesic (
Geodesic
) – Geodesic object whose synthesis loss we want to plot.ax (
Optional
[Axes
]) – If not None, the axis to plot this representation on. If None, we callplt.gca()
kwargs – passed to plt.semilogy
- Returns:
Axes containing the plot.
- Return type:
ax
plenoptic.synthesize.mad_competition module
Run MAD Competition.
- class plenoptic.synthesize.mad_competition.MADCompetition(image, optimized_metric, reference_metric, minmax, initial_noise=0.1, metric_tradeoff_lambda=None, range_penalty_lambda=0.1, allowed_range=(0, 1))[source]
Bases:
OptimizedSynthesis
Synthesize a single maximally-differentiating image for two metrics.
Following the basic idea in [1], this class synthesizes a maximally-differentiating image for two given metrics, based on a given image. We start by adding noise to this image and then iteratively adjusting its pixels so as to either minimize or maximize
optimized_metric
while holding the value ofreference_metric
constant.MADCompetiton accepts two metrics as its input. These should be callables that take two images and return a single number, and that number should be 0 if and only if the two images are identical (thus, the larger the number, the more different the two images).
Note that a full set of images MAD Competition images consists of two pairs: a maximal and a minimal image for each metric. A single instantiation of
MADCompetition
will generate one of these four images.- Parameters:
image (
Tensor
) – A 4d tensor, this is the image whose representation we wish to match. If this is not a tensor, we try to cast it as one.optimized_metric (
Union
[Module
,Callable
[[Tensor
,Tensor
],Tensor
]]) – The metric whose value you wish to minimize or maximize, which takes two tensors and returns a scalar. Because of the limitations of pickle, you cannot use a lambda function for this if you wish to save the MADCompetition object (i.e., it must be one of our built-in functions or defined using a def statement)reference_metric (
Union
[Module
,Callable
[[Tensor
,Tensor
],Tensor
]]) – The metric whose value you wish to keep fixed, which takes two tensors and returns a scalar. Because of the limitations of pickle, you cannot use a lambda function for this if you wish to save the MADCompetition object (i.e., it must be one of our built-in functions or defined using a def statement)minmax (
Literal
['min'
,'max'
]) – Whether you wish to minimize or maximizeoptimized_metric
.initial_noise (
float
) – Standard deviation of the Gaussian noise used to initializemad_image
fromimage
.metric_tradeoff_lambda (
Optional
[float
]) – Lambda to multiply byreference_metric
loss and add tooptimized_metric
loss. IfNone
, we pick a value so the two initial losses are approximately equal in magnitude.range_penalty_lambda (
float
) – Lambda to multiply by range penalty and add to loss.allowable_range – Range (inclusive) of allowed pixel values. Any values outside this range will be penalized.
- mad_image
The Maximally-Differentiating Image. This may be unfinished depending on how many iterations we’ve run for.
- Type:
torch.Tensor
- initial_image
The initial
mad_image
, which we obtain by adding Gaussian noise toimage
.- Type:
torch.Tensor
- losses
A list of the objective function’s loss over iterations.
- Type:
list
- gradient_norm
A list of the gradient’s L2 norm over iterations.
- Type:
list
- pixel_change_norm
A list containing the L2 norm of the pixel change over iterations (
pixel_change_norm[i]
is the pixel change norm inmad_image
between iterationsi
andi-1
).- Type:
list
- optimized_metric_loss
A list of the
optimized_metric
loss over iterations.- Type:
list
- reference_metric_loss
A list of the
reference_metric
loss over iterations.- Type:
list
- saved_mad_image
Saved
self.mad_image
for later examination.- Type:
torch.Tensor
References
[1]Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual discriminability. Journal of Vision, 8(12), 1–13. http://dx.doi.org/10.1167/8.12.8
- Attributes:
- allowed_range
gradient_norm
Synthesis gradient’s L2 norm over iterations.
- image
- initial_image
losses
Synthesis loss over iterations.
- mad_image
- metric_tradeoff_lambda
- minmax
- optimized_metric
- optimized_metric_loss
- optimizer
pixel_change_norm
L2 norm change in pixel values over iterations.
- range_penalty_lambda
- reference_metric
- reference_metric_loss
- saved_mad_image
- store_progress
Methods
load
(file_path[, map_location])Load all relevant stuff from a .pt file.
objective_function
([mad_image, image])Compute the MADCompetition synthesis loss.
save
(file_path)Save all relevant variables in .pt file.
synthesize
([max_iter, optimizer, scheduler, ...])Synthesize a MAD image.
to
(*args, **kwargs)Moves and/or casts the parameters and buffers.
- property image
- property initial_image
- load(file_path, map_location=None, **pickle_load_args)[source]
Load all relevant stuff from a .pt file.
This should be called by an initialized
MADCompetition
object – we will ensure thatimage
,metric_tradeoff_lambda
,range_penalty_lambda
,allowed_range
,minmax
are all identical, and thatreference_metric
andoptimize_metric
return identical values.Note this operates in place and so doesn’t return anything.
- Parameters:
file_path (str) – The path to load the synthesis object from
map_location (str, optional) – map_location argument to pass to
torch.load
. If you save stuff that was being run on a GPU and are loading onto a CPU, you’ll need this to make sure everything lines up properly. This should be structured like the str you would pass totorch.device
pickle_load_args – any additional kwargs will be added to
pickle_module.load
viatorch.load
, see that function’s docstring for details.
Examples
>>> mad = po.synth.MADCompetition(img, model) >>> mad.synthesize(max_iter=10, store_progress=True) >>> mad.save('mad.pt') >>> mad_copy = po.synth.MADCompetition(img, model) >>> mad_copy.load('mad.pt')
Note that you must create a new instance of the Synthesis object and then load.
- property mad_image
- property metric_tradeoff_lambda
- property minmax
- objective_function(mad_image=None, image=None)[source]
Compute the MADCompetition synthesis loss.
This computes:
\[\begin{split}t L_1(x, \hat{x}) &+ \lambda_1 [L_2(x, x+\epsilon) - L_2(x, \hat{x})]^2 \\ &+ \lambda_2 \mathcal{B}(\hat{x})\end{split}\]where \(t\) is 1 if
self.minmax
is'min'
and -1 if it’s'max'
, \(L_1\) isself.optimized_metric
, \(L_2\) isself.reference_metric
, \(x\) isself.image
, \(\hat{x}\) isself.mad_image
, \(\epsilon\) is the initial noise, \(\mathcal{B}\) is the quadratic bound penalty, \(\lambda_1\) isself.metric_tradeoff_lambda
and \(\lambda_2\) isself.range_penalty_lambda
.- Parameters:
mad_image (
Optional
[Tensor
]) – Proposedmad_image
, \(\hat{x}\) in the above equation. If None, useself.mad_image
.image (
Optional
[Tensor
]) – Proposedimage
, \(x\) in the above equation. If None, useself.image
.
- Return type:
loss
- property optimized_metric
- property optimized_metric_loss
- property reference_metric
- property reference_metric_loss
- save(file_path)[source]
Save all relevant variables in .pt file.
Note that if store_progress is True, this will probably be very large.
See
load
docstring for an example of use.- Parameters:
file_path (str) – The path to save the MADCompetition object to
- property saved_mad_image
- synthesize(max_iter=100, optimizer=None, scheduler=None, store_progress=False, stop_criterion=0.0001, stop_iters_to_check=50)[source]
Synthesize a MAD image.
Update the pixels of
initial_image
to maximize or minimize (depending on the value ofminmax
) the value ofoptimized_metric(image, mad_image)
while keeping the value ofreference_metric(image, mad_image)
constant.We run this until either we reach
max_iter
or the change over the paststop_iters_to_check
iterations is less thanstop_criterion
, whichever comes first- Parameters:
max_iter (
int
) – The maximum number of iterations to run before we end synthesis (unless we hit the stop criterion).optimizer (
Optional
[Optimizer
]) – The optimizer to use. If None and this is the first time calling synthesize, we use Adam(lr=.01, amsgrad=True); if synthesize has been called before, this must be None and we reuse the previous optimizer.scheduler (
Optional
[_LRScheduler
]) – The learning rate scheduler to use. If None, we don’t use one.store_progress (
Union
[bool
,int
]) – Whether we should store the representation of the MAD image in progress on every iteration. If False, we don’t save anything. If True, we save every iteration. If an int, we save everystore_progress
iterations (note then that 0 is the same as False and 1 the same as True).stop_criterion (
float
) – If the loss over the paststop_iters_to_check
has changed less thanstop_criterion
, we terminate synthesis.stop_iters_to_check (
int
) – How many iterations back to check in order to see if the loss has stopped decreasing (forstop_criterion
).
- to(*args, **kwargs)[source]
Moves and/or casts the parameters and buffers.
This can be called as
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point desireddtype
s. In addition, this method will only cast the floating point parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point type of the floating point parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- device (
- plenoptic.synthesize.mad_competition.animate(mad, framerate=10, batch_idx=0, channel_idx=None, zoom=None, fig=None, axes_idx={}, figsize=None, included_plots=['display_mad_image', 'plot_loss', 'plot_pixel_values'], width_ratios={})[source]
Animate synthesis progress.
This is essentially the figure produced by
mad.plot_synthesis_status
animated over time, for each stored iteration.We return the matplotlib FuncAnimation object. In order to view it in a Jupyter notebook, use the
plenoptic.tools.display.convert_anim_to_html(anim)
function. In order to save, useanim.save(filename)
(note for this that you’ll need the appropriate writer installed and on your path, e.g., ffmpeg, imagemagick, etc). Either of these will probably take a reasonably long amount of time.- Parameters:
mad (
MADCompetition
) – MADCompetition object whose synthesis we want to animate.framerate (
int
) – How many frames a second to display.batch_idx (
int
) – Which index to take from the batch dimensionchannel_idx (
Optional
[int
]) – Which index to take from the channel dimension. If None, we use all channels (assumed use-case is RGB(A) image).zoom (
Optional
[float
]) – How much to zoom in / enlarge the synthesized image, the ratio of display pixels to image pixels. If None (the default), we attempt to find the best value ourselves.fig (
Optional
[Figure
]) – If None, create the figure from scratch. Else, should be an empty figure with enough axes (the expected use here is have same-size movies with different plots).axes_idx (
Dict
[str
,int
]) – Dictionary specifying which axes contains which type of plot, allows for more fine-grained control of the resulting figure. Probably only helpful if fig is also defined. Possible keys:'mad_image', 'loss', 'pixel_values', 'misc'
. Values should all be ints. If you tell this function to create a plot that doesn’t have a corresponding key, we find the lowest int that is not already in the dict, so if you have axes that you want unchanged, place their idx in'misc'
.figsize (
Optional
[Tuple
[float
]]) – The size of the figure to create. It may take a little bit of playing around to find a reasonable value. If None, we attempt to make our best guess, aiming to have each axis be of size (5, 5)width_ratios (
Dict
[str
,float
]) – By default, all plots axes will have the same width. To change that, specify their relative widths using the keys: [‘display_mad_image’, ‘plot_loss’, ‘plot_pixel_values’] and floats specifying their relative width. Any not included will be assumed to be 1.
- Returns:
The animation object. In order to view, must convert to HTML or save.
- Return type:
anim
Notes
By default, we use the ffmpeg backend, which requires that you have ffmpeg installed and on your path (https://ffmpeg.org/download.html). To use a different, use the matplotlib rcParams: matplotlib.rcParams[‘animation.writer’] = writer, see https://matplotlib.org/stable/api/animation_api.html#writer-classes for more details.
For displaying in a jupyter notebook, ffmpeg appears to be required.
- plenoptic.synthesize.mad_competition.display_mad_image(mad, batch_idx=0, channel_idx=None, zoom=None, iteration=None, ax=None, title='MADCompetition', **kwargs)[source]
Display MAD image.
You can specify what iteration to view by using the
iteration
arg. The default,None
, shows the final one.We use
plenoptic.imshow
to display the synthesized image and attempt to automatically find the most reasonable zoom value. You can override this value using the zoom arg, but remember thatplenoptic.imshow
is opinionated about the size of the resulting image and will throw an Exception if the axis created is not big enough for the selected zoom.- Parameters:
mad (
MADCompetition
) – MADCompetition object whose MAD image we want to display.batch_idx (
int
) – Which index to take from the batch dimensionchannel_idx (
Optional
[int
]) – Which index to take from the channel dimension. If None, we assume image is RGB(A) and show all channels.zoom (
Optional
[float
]) – How much to zoom in / enlarge the synthesized image, the ratio of display pixels to image pixels. If None (the default), we attempt to find the best value ourselves.iteration (
Optional
[int
]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.ax (
Optional
[Axes
]) – Pre-existing axes for plot. If None, we callplt.gca()
.title (
str
) – Title of the axis.kwargs – Passed to
plenoptic.imshow
- Returns:
The matplotlib axes containing the plot.
- Return type:
ax
- plenoptic.synthesize.mad_competition.display_mad_image_all(mad_metric1_min, mad_metric2_min, mad_metric1_max, mad_metric2_max, metric1_name=None, metric2_name=None, zoom=1, **kwargs)[source]
Display all MAD Competition images.
To generate a full set of MAD Competition images, you need four instances: one for minimizing and maximizing each metric. This helper function creates a figure to display the full set of images.
In addition to the four MAD Competition images, this also plots the initial image from mad_metric1_min, for comparison.
Note that all four MADCompetition instances must have the same image.
- Parameters:
mad_metric1_min (
MADCompetition
) – MADCompetition object that minimized the first metric.mad_metric2_min (
MADCompetition
) – MADCompetition object that minimized the second metric.mad_metric1_max (
MADCompetition
) – MADCompetition object that maximized the first metric.mad_metric2_max (
MADCompetition
) – MADCompetition object that maximized the second metric.metric1_name (
Optional
[str
]) – Name of the first metric. If None, we use the name of the optimized_metric function from mad_metric1_min.metric2_name (
Optional
[str
]) – Name of the second metric. If None, we use the name of the optimized_metric function from mad_metric2_min.zoom (
Union
[int
,float
]) – Ratio of display pixels to image pixels. See plenoptic.imshow for details.kwargs – Passed to plenoptic.imshow.
- Returns:
Figure containing the images.
- Return type:
fig
- plenoptic.synthesize.mad_competition.plot_loss(mad, iteration=None, axes=None, **kwargs)[source]
Plot metric losses.
Plots
mad.optimized_metric_loss
andmad.reference_metric_loss
on two separate axes, over all iterations. Also plots a red dot atiteration
, to highlight the loss there. Ifiteration=None
, then the dot will be at the final iteration.- Parameters:
mad (
MADCompetition
) – MADCompetition object whose loss we want to plot.iteration (
Optional
[int
]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.axes (
Union
[List
[Axes
],Axes
,None
]) – Pre-existing axes for plot. If a list of axes, must be the two axes to use for this plot. If a single axis, we’ll split it in half horizontally. If None, we callplt.gca()
.kwargs – passed to plt.plot
- Returns:
The matplotlib axes containing the plot.
- Return type:
axes
Notes
We plot
abs(mad.losses)
because if we’re maximizing the synthesis metric, we minimized its negative. By plotting the absolute value, we get them all on the same scale.
- plenoptic.synthesize.mad_competition.plot_loss_all(mad_metric1_min, mad_metric2_min, mad_metric1_max, mad_metric2_max, metric1_name=None, metric2_name=None, metric1_kwargs={'c': 'C0'}, metric2_kwargs={'c': 'C1'}, min_kwargs={'linestyle': '--'}, max_kwargs={'linestyle': '-'}, figsize=(10, 5))[source]
Plot loss for full set of MAD Competiton instances.
To generate a full set of MAD Competition images, you need four instances: one for minimizing and maximizing each metric. This helper function creates a two-axis figure to display the loss for this full set.
Note that all four MADCompetition instances must have the same image.
- Parameters:
mad_metric1_min (
MADCompetition
) – MADCompetition object that minimized the first metric.mad_metric2_min (
MADCompetition
) – MADCompetition object that minimized the second metric.mad_metric1_max (
MADCompetition
) – MADCompetition object that maximized the first metric.mad_metric2_max (
MADCompetition
) – MADCompetition object that maximized the second metric.metric1_name (
Optional
[str
]) – Name of the first metric. If None, we use the name of the optimized_metric function from mad_metric1_min.metric2_name (
Optional
[str
]) – Name of the second metric. If None, we use the name of the optimized_metric function from mad_metric2_min.metric1_kwargs (
Dict
) – Dictionary of arguments to pass to matplotlib.pyplot.plot to identify synthesis instance where the first metric was being optimized.metric2_kwargs (
Dict
) – Dictionary of arguments to pass to matplotlib.pyplot.plot to identify synthesis instance where the second metric was being optimized.min_kwargs (
Dict
) – Dictionary of arguments to pass to matplotlib.pyplot.plot to identify synthesis instance where optimized_metric was being minimized.max_kwargs (
Dict
) – Dictionary of arguments to pass to matplotlib.pyplot.plot to identify synthesis instance where optimized_metric was being maximized.figsize – Size of the figure we create.
- Returns:
Figure containing the plot.
- Return type:
fig
- plenoptic.synthesize.mad_competition.plot_pixel_values(mad, batch_idx=0, channel_idx=None, iteration=None, ylim=False, ax=None, **kwargs)[source]
Plot histogram of pixel values of reference and MAD images.
As a way to check the distributions of pixel intensities and see if there’s any values outside the allowed range
- Parameters:
mad (
MADCompetition
) – MADCompetition object with the images whose pixel values we want to compare.batch_idx (
int
) – Which index to take from the batch dimensionchannel_idx (
Optional
[int
]) – Which index to take from the channel dimension. If None, we use all channels (assumed use-case is RGB(A) images).iteration (
Optional
[int
]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.ylim (
Union
[Tuple
[float
],Literal
[False
]]) – if tuple, the ylimit to set for this axis. If False, we leave it untouchedax (
Optional
[Axes
]) – Pre-existing axes for plot. If None, we callplt.gca()
.kwargs – passed to plt.hist
- Returns:
Creates axes.
- Return type:
ax
- plenoptic.synthesize.mad_competition.plot_synthesis_status(mad, batch_idx=0, channel_idx=None, iteration=None, vrange='indep1', zoom=None, fig=None, axes_idx={}, figsize=None, included_plots=['display_mad_image', 'plot_loss', 'plot_pixel_values'], width_ratios={})[source]
Make a plot showing synthesis status.
We create several subplots to analyze this. By default, we create two subplots on a new figure: the first one contains the MAD image and the second contains the loss.
There is an optional additional plot: pixel_values, a histogram of pixel values of the synthesized and target images.
All of these (including the default plots) can be toggled using their corresponding boolean flags, and can be created separately using the method with the name plot_{flag}.
- Parameters:
mad (
MADCompetition
) – MADCompetition object whose status we want to plot.batch_idx (
int
) – Which index to take from the batch dimensionchannel_idx (
Optional
[int
]) – Which index to take from the channel dimension. If None, we use all channels (assumed use-case is RGB(A) image).iteration (
Optional
[int
]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.vrange (
Union
[Tuple
[float
],str
]) – The vrange option to pass todisplay_mad_image()
. See docstring ofimshow
for possible values.zoom (
Optional
[float
]) – How much to zoom in / enlarge the synthesized image, the ratio of display pixels to image pixels. If None (the default), we attempt to find the best value ourselves.fig (
Optional
[Figure
]) – if None, we create a new figure. otherwise we assume this is an empty figure that has the appropriate size and number of subplotsaxes_idx (
Dict
[str
,int
]) – Dictionary specifying which axes contains which type of plot, allows for more fine-grained control of the resulting figure. Probably only helpful if fig is also defined. Possible keys:'mad_image', 'loss', 'pixel_values', 'misc'
. Values should all be ints. If you tell this function to create a plot that doesn’t have a corresponding key, we find the lowest int that is not already in the dict, so if you have axes that you want unchanged, place their idx in'misc'
.figsize (
Optional
[Tuple
[float
]]) – The size of the figure to create. It may take a little bit of playing around to find a reasonable value. If None, we attempt to make our best guess, aiming to have each axis be of size (5, 5)included_plots (
List
[str
]) – Which plots to include. Must be some subset of'display_mad_image', 'plot_loss', 'plot_pixel_values'
.width_ratios (
Dict
[str
,float
]) – By default, all plots axes will have the same width. To change that, specify their relative widths using the keys: [‘display_mad_image’, ‘plot_loss’, ‘plot_pixel_values’] and floats specifying their relative width. Any not included will be assumed to be 1.
- Return type:
Tuple
[Figure
,Dict
[str
,int
]]- Returns:
fig – The figure containing this plot
axes_idx – Dictionary giving index of each plot.
plenoptic.synthesize.metamer module
Synthesize model metamers.
- class plenoptic.synthesize.metamer.Metamer(image, model, loss_function=<function mse>, range_penalty_lambda=0.1, allowed_range=(0, 1), initial_image=None)[source]
Bases:
OptimizedSynthesis
Synthesize metamers for image-computable differentiable models.
Following the basic idea in [1], this class creates a metamer for a given model on a given image. We start with
initial_image
and iteratively adjust the pixel values so as to match the representation of themetamer
andimage
.All
saved_
attributes are initialized as empty lists and will be non-empty if thestore_progress
arg tosynthesize()
is notFalse
. They will be appended to on every iteration ifstore_progress=True
or everystore_progress
iterations if it’s anint
.- Parameters:
image (
Tensor
) – A 4d tensor, this is the image whose representation we wish to match. If this is not a tensor, we try to cast it as one.model (
Module
) – A visual model, see Metamer notebook for more detailsloss_function (
Callable
[[Tensor
,Tensor
],Tensor
]) – the loss function to use to compare the representations of the models in order to determine their loss. Because of the limitations of pickle, you cannot use a lambda function for this if you wish to save the Metamer object (i.e., it must be one of our built-in functions or defined using a def statement)range_penalty_lambda (
float
) – strength of the regularizer that enforces the allowed_range. Must be non-negative.allowed_range (
Tuple
[float
,float
]) – Range (inclusive) of allowed pixel values. Any values outside this range will be penalized.initial_image (
Optional
[Tensor
]) – 4d Tensor to initialize our metamer with. If None, will draw a sample of uniform noise withinallowed_range
.
- target_representation
Whatever is returned by
model(image)
, this is what we match in order to create a metamer- Type:
torch.Tensor
- metamer
The metamer. This may be unfinished depending on how many iterations we’ve run for.
- Type:
torch.Tensor
- losses
A list of our loss over iterations.
- Type:
list
- gradient_norm
A list of the gradient’s L2 norm over iterations.
- Type:
list
- pixel_change_norm
A list containing the L2 norm of the pixel change over iterations (
pixel_change_norm[i]
is the pixel change norm inmetamer
between iterationsi
andi-1
).- Type:
list
- saved_metamer
Saved
self.metamer
for later examination.- Type:
torch.Tensor
References
[1]J Portilla and E P Simoncelli. A Parametric Texture Model based on Joint Statistics of Complex Wavelet Coefficients. Int’l Journal of Computer Vision. 40(1):49-71, October, 2000. http://www.cns.nyu.edu/~eero/ABSTRACTS/portilla99-abstract.html http://www.cns.nyu.edu/~lcv/texture/
- Attributes:
- allowed_range
gradient_norm
Synthesis gradient’s L2 norm over iterations.
- image
losses
Synthesis loss over iterations.
- metamer
- model
- optimizer
pixel_change_norm
L2 norm change in pixel values over iterations.
- range_penalty_lambda
- saved_metamer
- store_progress
target_representation
Model representation of
image
, the goal of synthesis is formodel(metamer)
to match this value.
Methods
load
(file_path[, map_location])Load all relevant stuff from a .pt file.
objective_function
([metamer_representation, ...])Compute the metamer synthesis loss.
save
(file_path)Save all relevant variables in .pt file.
synthesize
([max_iter, optimizer, scheduler, ...])Synthesize a metamer.
to
(*args, **kwargs)Moves and/or casts the parameters and buffers.
- property image
- load(file_path, map_location=None, **pickle_load_args)[source]
Load all relevant stuff from a .pt file.
This should be called by an initialized
Metamer
object – we will ensure thatimage
,target_representation
(and thusmodel
), andloss_function
are all identical.Note this operates in place and so doesn’t return anything.
- Parameters:
file_path (str) – The path to load the synthesis object from
map_location (str, optional) – map_location argument to pass to
torch.load
. If you save stuff that was being run on a GPU and are loading onto a CPU, you’ll need this to make sure everything lines up properly. This should be structured like the str you would pass totorch.device
pickle_load_args – any additional kwargs will be added to
pickle_module.load
viatorch.load
, see that function’s docstring for details.
Examples
>>> metamer = po.synth.Metamer(img, model) >>> metamer.synthesize(max_iter=10, store_progress=True) >>> metamer.save('metamers.pt') >>> metamer_copy = po.synth.Metamer(img, model) >>> metamer_copy.load('metamers.pt')
Note that you must create a new instance of the Synthesis object and then load.
- property metamer
- property model
- objective_function(metamer_representation=None, target_representation=None)[source]
Compute the metamer synthesis loss.
This calls self.loss_function on
metamer_representation
andtarget_representation
and then adds the weighted range penalty.- Parameters:
metamer_representation (
Optional
[Tensor
]) – Model response tometamer
. If None, we useself.model(self.metamer)
target_representation (
Optional
[Tensor
]) – Model response toimage
. If None, we useself.target_representation
.
- Return type:
loss
- save(file_path)[source]
Save all relevant variables in .pt file.
Note that if store_progress is True, this will probably be very large.
See
load
docstring for an example of use.- Parameters:
file_path (str) – The path to save the metamer object to
- property saved_metamer
- synthesize(max_iter=100, optimizer=None, scheduler=None, store_progress=False, stop_criterion=0.0001, stop_iters_to_check=50)[source]
Synthesize a metamer.
Update the pixels of
initial_image
until its representation matches that ofimage
.We run this until either we reach
max_iter
or the change over the paststop_iters_to_check
iterations is less thanstop_criterion
, whichever comes first- Parameters:
max_iter (
int
) – The maximum number of iterations to run before we end synthesis (unless we hit the stop criterion).optimizer (
Optional
[Optimizer
]) – The optimizer to use. If None and this is the first time calling synthesize, we use Adam(lr=.01, amsgrad=True); if synthesize has been called before, this must be None and we reuse the previous optimizer.scheduler (
Optional
[_LRScheduler
]) – The learning rate scheduler to use. If None, we don’t use one.store_progress (
Union
[bool
,int
]) – Whether we should store the metamer image in progress on every iteration. If False, we don’t save anything. If True, we save every iteration. If an int, we save everystore_progress
iterations (note then that 0 is the same as False and 1 the same as True).stop_criterion (
float
) – If the loss over the paststop_iters_to_check
has changed less thanstop_criterion
, we terminate synthesis.stop_iters_to_check (
int
) – How many iterations back to check in order to see if the loss has stopped decreasing (forstop_criterion
).
- property target_representation
Model representation of
image
, the goal of synthesis is formodel(metamer)
to match this value.
- to(*args, **kwargs)[source]
Moves and/or casts the parameters and buffers.
This can be called as
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point desireddtype
s. In addition, this method will only cast the floating point parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point type of the floating point parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- device (
- class plenoptic.synthesize.metamer.MetamerCTF(image, model, loss_function=<function mse>, range_penalty_lambda=0.1, allowed_range=(0, 1), initial_image=None, coarse_to_fine='together')[source]
Bases:
Metamer
Synthesize model metamers with coarse-to-fine synthesis.
This is a special case of
Metamer
, which uses the coarse-to-fine synthesis procedure described in [1]_: we start by updating metamer with respect to only a subset of the model’s representation (generally, that which corresponds to the lowest spatial frequencies), and changing which subset we consider over the course of synthesis. This is similar to optimizing with a blurred version of the objective function and gradually adding in finer details. It improves synthesis performance for some models.- Parameters:
image (
Tensor
) – A 4d tensor, this is the image whose representation we wish to match. If this is not a tensor, we try to cast it as one.model (
Module
) – A visual model, see Metamer notebook for more detailsloss_function (
Callable
[[Tensor
,Tensor
],Tensor
]) – the loss function to use to compare the representations of the models in order to determine their loss. Because of the limitations of pickle, you cannot use a lambda function for this if you wish to save the Metamer object (i.e., it must be one of our built-in functions or defined using a def statement)range_penalty_lambda (
float
) – strength of the regularizer that enforces the allowed_range. Must be non-negative.allowed_range (
Tuple
[float
,float
]) – Range (inclusive) of allowed pixel values. Any values outside this range will be penalized.initial_image (
Optional
[Tensor
]) – 4d Tensor to initialize our metamer with. If None, will draw a sample of uniform noise withinallowed_range
.coarse_to_fine (
Literal
['together'
,'separate'
]) –‘together’: start with the coarsest scale, then gradually add each finer scale.
’separate’: compute the gradient with respect to each scale separately (ignoring the others), then with respect to all of them at the end.
(see
Metamer
tutorial for more details).
- target_representation
Whatever is returned by
model(image)
, this is what we match in order to create a metamer- Type:
torch.Tensor
- metamer
The metamer. This may be unfinished depending on how many iterations we’ve run for.
- Type:
torch.Tensor
- losses
A list of our loss over iterations.
- Type:
list
- gradient_norm
A list of the gradient’s L2 norm over iterations.
- Type:
list
- pixel_change_norm
A list containing the L2 norm of the pixel change over iterations (
pixel_change_norm[i]
is the pixel change norm inmetamer
between iterationsi
andi-1
).- Type:
list
- saved_metamer
Saved
self.metamer
for later examination.- Type:
torch.Tensor
- scales
The list of scales in optimization order (i.e., from coarse to fine). Will be modified during the course of optimization.
- Type:
list or None
- scales_loss
The scale-specific loss at each iteration
- Type:
list or None
- scales_timing
Keys are the values found in
scales
, values are lists, specifying the iteration where we started and stopped optimizing this scale.- Type:
dict or None
- scales_finished
List of scales that we’ve finished optimizing.
- Type:
list or None
- Attributes:
- allowed_range
- coarse_to_fine
gradient_norm
Synthesis gradient’s L2 norm over iterations.
- image
losses
Synthesis loss over iterations.
- metamer
- model
- optimizer
pixel_change_norm
L2 norm change in pixel values over iterations.
- range_penalty_lambda
- saved_metamer
- scales
- scales_finished
- scales_loss
- scales_timing
- store_progress
target_representation
Model representation of
image
, the goal of synthesis is formodel(metamer)
to match this value.
Methods
load
(file_path[, map_location])Load all relevant stuff from a .pt file.
objective_function
([metamer_representation, ...])Compute the metamer synthesis loss.
save
(file_path)Save all relevant variables in .pt file.
synthesize
([max_iter, optimizer, scheduler, ...])Synthesize a metamer.
to
(*args, **kwargs)Moves and/or casts the parameters and buffers.
- property coarse_to_fine
- load(file_path, map_location=None, **pickle_load_args)[source]
Load all relevant stuff from a .pt file.
This should be called by an initialized
Metamer
object – we will ensure thatimage
,target_representation
(and thusmodel
), andloss_function
are all identical.Note this operates in place and so doesn’t return anything.
- Parameters:
file_path (str) – The path to load the synthesis object from
map_location (str, optional) – map_location argument to pass to
torch.load
. If you save stuff that was being run on a GPU and are loading onto a CPU, you’ll need this to make sure everything lines up properly. This should be structured like the str you would pass totorch.device
pickle_load_args – any additional kwargs will be added to
pickle_module.load
viatorch.load
, see that function’s docstring for details.
Examples
>>> metamer = po.synth.Metamer(img, model) >>> metamer.synthesize(max_iter=10, store_progress=True) >>> metamer.save('metamers.pt') >>> metamer_copy = po.synth.Metamer(img, model) >>> metamer_copy.load('metamers.pt')
Note that you must create a new instance of the Synthesis object and then load.
- property scales
- property scales_finished
- property scales_loss
- property scales_timing
- synthesize(max_iter=100, optimizer=None, scheduler=None, store_progress=False, stop_criterion=0.0001, stop_iters_to_check=50, change_scale_criterion=0.01, ctf_iters_to_check=50)[source]
Synthesize a metamer.
Update the pixels of
initial_image
until its representation matches that ofimage
.We run this until either we reach
max_iter
or the change over the paststop_iters_to_check
iterations is less thanstop_criterion
, whichever comes first- Parameters:
max_iter (
int
) – The maximum number of iterations to run before we end synthesis (unless we hit the stop criterion).optimizer (
Optional
[Optimizer
]) – The optimizer to use. If None and this is the first time calling synthesize, we use Adam(lr=.01, amsgrad=True); if synthesize has been called before, this must be None and we reuse the previous optimizer.scheduler (
Optional
[_LRScheduler
]) – The learning rate scheduler to use. If None, we don’t use one.store_progress (
Union
[bool
,int
]) – Whether we should store the metamer image in progress on every iteration. If False, we don’t save anything. If True, we save every iteration. If an int, we save everystore_progress
iterations (note then that 0 is the same as False and 1 the same as True).stop_criterion (
float
) – If the loss over the paststop_iters_to_check
has changed less thanstop_criterion
, we terminate synthesis.stop_iters_to_check (
int
) – How many iterations back to check in order to see if the loss has stopped decreasing (forstop_criterion
).change_scale_criterion (
Optional
[float
]) – Scale-specific analogue ofchange_scale_criterion
: we consider a given scale finished (and move onto the next) if the loss has changed less than this in the pastctf_iters_to_check
iterations. IfNone
, we’ll change scales as soon as we’ve spentctf_iters_to_check
on a given scalectf_iters_to_check (
int
) – Scale-specific analogue ofstop_iters_to_check
: how many iterations back in order to check in order to see if we should switch scales.
- plenoptic.synthesize.metamer.animate(metamer, framerate=10, batch_idx=0, channel_idx=None, ylim=None, vrange=(0, 1), zoom=None, plot_representation_error_as_rgb=False, fig=None, axes_idx={}, figsize=None, included_plots=['display_metamer', 'plot_loss', 'plot_representation_error'], width_ratios={})[source]
Animate synthesis progress.
This is essentially the figure produced by
metamer.plot_synthesis_status
animated over time, for each stored iteration.We return the matplotlib FuncAnimation object. In order to view it in a Jupyter notebook, use the
plenoptic.tools.display.convert_anim_to_html(anim)
function. In order to save, useanim.save(filename)
(note for this that you’ll need the appropriate writer installed and on your path, e.g., ffmpeg, imagemagick, etc). Either of these will probably take a reasonably long amount of time.- Parameters:
metamer (
Metamer
) – Metamer object whose synthesis we want to animate.framerate (
int
) – How many frames a second to display.batch_idx (
int
) – Which index to take from the batch dimensionchannel_idx (
Optional
[int
]) – Which index to take from the channel dimension. If None, we use all channels (assumed use-case is RGB(A) image).ylim (
Union
[str
,None
,Tuple
[float
,float
],Literal
[False
]]) –The y-limits of the representation_error plot:
If a tuple, then this is the ylim of all plots
If None, then all plots have the same limits, all symmetric about 0 with a limit of
np.abs(representation_error).max()
(for the initial representation_error)If False, don’t modify limits.
If a string, must be ‘rescale’ or of the form ‘rescaleN’, where N can be any integer. If ‘rescaleN’, we rescale the limits every N frames (we rescale as if ylim = None). If ‘rescale’, then we do this 10 times over the course of the animation
vrange (
Union
[Tuple
[float
,float
],str
]) – The vrange option to pass todisplay_metamer()
. See docstring ofimshow
for possible values.zoom (
Optional
[float
]) – How much to zoom in / enlarge the metamer, the ratio of display pixels to image pixels. If None (the default), we attempt to find the best value ourselves.plot_representation_error_as_rgb (
bool
) – The representation can be image-like with multiple channels, and we have no way to determine whether it should be represented as an RGB image or not, so the user must set this flag to tell us. It will be ignored if the representation doesn’t look image-like or if the model has its own plot_representation_error() method. Else, it will be passed to po.imshow(), see that methods docstring for details. since plot_synthesis_status normally sets it up for usfig (
Optional
[Figure
]) – If None, create the figure from scratch. Else, should be an empty figure with enough axes (the expected use here is have same-size movies with different plots).axes_idx (
Dict
[str
,int
]) – Dictionary specifying which axes contains which type of plot, allows for more fine-grained control of the resulting figure. Probably only helpful if fig is also defined. Possible keys:'display_metamer', 'plot_loss', 'plot_representation_error', 'plot_pixel_values', 'misc'
. Values should all be ints. If you tell this function to create a plot that doesn’t have a corresponding key, we find the lowest int that is not already in the dict, so if you have axes that you want unchanged, place their idx in'misc'
.figsize (
Optional
[Tuple
[float
,float
]]) – The size of the figure to create. It may take a little bit of playing around to find a reasonable value. If None, we attempt to make our best guess, aiming to have each axis be of size (5, 5)included_plots (
List
[str
]) – Which plots to include. Must be some subset of'display_metamer', 'plot_loss', 'plot_representation_error', 'plot_pixel_values'
.width_ratios (
Dict
[str
,float
]) – By default, all plots axes will have the same width. To change that, specify their relative widths using the keys:'display_metamer', 'plot_loss', 'plot_representation_error', 'plot_pixel_values'
and floats specifying their relative width. Any not included will be assumed to be 1.
- Returns:
The animation object. In order to view, must convert to HTML or save.
- Return type:
anim
Notes
By default, we use the ffmpeg backend, which requires that you have ffmpeg installed and on your path (https://ffmpeg.org/download.html). To use a different, use the matplotlib rcParams: matplotlib.rcParams[‘animation.writer’] = writer, see https://matplotlib.org/stable/api/animation_api.html#writer-classes for more details.
For displaying in a jupyter notebook, ffmpeg appears to be required.
- plenoptic.synthesize.metamer.display_metamer(metamer, batch_idx=0, channel_idx=None, zoom=None, iteration=None, ax=None, **kwargs)[source]
Display metamer.
You can specify what iteration to view by using the
iteration
arg. The default,None
, shows the final one.We use
plenoptic.imshow
to display the metamer and attempt to automatically find the most reasonable zoom value. You can override this value using the zoom arg, but remember thatplenoptic.imshow
is opinionated about the size of the resulting image and will throw an Exception if the axis created is not big enough for the selected zoom.- Parameters:
metamer (
Metamer
) – Metamer object whose synthesized metamer we want to display.batch_idx (
int
) – Which index to take from the batch dimensionchannel_idx (
Optional
[int
]) – Which index to take from the channel dimension. If None, we assume image is RGB(A) and show all channels.iteration (
Optional
[int
]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.ax (
Optional
[Axes
]) – Pre-existing axes for plot. If None, we callplt.gca()
.zoom (
Optional
[float
]) – How much to zoom in / enlarge the metamer, the ratio of display pixels to image pixels. If None (the default), we attempt to find the best value ourselves.kwargs – Passed to
plenoptic.imshow
- Returns:
The matplotlib axes containing the plot.
- Return type:
ax
- plenoptic.synthesize.metamer.plot_loss(metamer, iteration=None, ax=None, **kwargs)[source]
Plot synthesis loss with log-scaled y axis.
Plots
metamer.losses
over all iterations. Also plots a red dot atiteration
, to highlight the loss there. Ifiteration=None
, then the dot will be at the final iteration.- Parameters:
metamer (
Metamer
) – Metamer object whose loss we want to plot.iteration (
Optional
[int
]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.ax (
Optional
[Axes
]) – Pre-existing axes for plot. If None, we callplt.gca()
.kwargs – passed to plt.semilogy
- Returns:
The matplotlib axes containing the plot.
- Return type:
ax
- plenoptic.synthesize.metamer.plot_pixel_values(metamer, batch_idx=0, channel_idx=None, iteration=None, ylim=False, ax=None, **kwargs)[source]
Plot histogram of pixel values of target image and its metamer.
As a way to check the distributions of pixel intensities and see if there’s any values outside the allowed range
- Parameters:
metamer (
Metamer
) – Metamer object with the images whose pixel values we want to compare.batch_idx (
int
) – Which index to take from the batch dimensionchannel_idx (
Optional
[int
]) – Which index to take from the channel dimension. If None, we use all channels (assumed use-case is RGB(A) images).iteration (
Optional
[int
]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.ylim (
Union
[Tuple
[float
,float
],Literal
[False
]]) – if tuple, the ylimit to set for this axis. If False, we leave it untouchedax (
Optional
[Axes
]) – Pre-existing axes for plot. If None, we callplt.gca()
.kwargs – passed to plt.hist
- Returns:
Created axes.
- Return type:
ax
- plenoptic.synthesize.metamer.plot_representation_error(metamer, batch_idx=0, iteration=None, ylim=None, ax=None, as_rgb=False, **kwargs)[source]
Plot distance ratio showing how close we are to convergence.
We plot
_representation_error(metamer, iteration)
. For more details, seeplenoptic.tools.display.plot_representation
.- Parameters:
metamer (
Metamer
) – Metamer object whose synthesized metamer we want to display.batch_idx (
int
) – Which index to take from the batch dimensioniteration (
Optional
[int
]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.ylim (
Union
[Tuple
[float
,float
],None
,Literal
[False
]]) – Ifylim
isNone
, we sets the axes’ y-limits to be(-y_max, y_max)
, wherey_max=np.abs(data).max()
. If it’sFalse
, we do nothing. If a tuple, we use that range.ax (
Optional
[Axes
]) – Pre-existing axes for plot. If None, we callplt.gca()
.as_rgb (bool, optional) – The representation can be image-like with multiple channels, and we have no way to determine whether it should be represented as an RGB image or not, so the user must set this flag to tell us. It will be ignored if the response doesn’t look image-like or if the model has its own plot_representation_error() method. Else, it will be passed to po.imshow(), see that methods docstring for details.
kwargs – Passed to
metamer.model.forward
- Returns:
List of created axes
- Return type:
axes
- plenoptic.synthesize.metamer.plot_synthesis_status(metamer, batch_idx=0, channel_idx=None, iteration=None, ylim=None, vrange='indep1', zoom=None, plot_representation_error_as_rgb=False, fig=None, axes_idx={}, figsize=None, included_plots=['display_metamer', 'plot_loss', 'plot_representation_error'], width_ratios={})[source]
Make a plot showing synthesis status.
We create several subplots to analyze this. By default, we create three subplots on a new figure: the first one contains the synthesized metamer, the second contains the loss, and the third contains the representation error.
There is an optional additional plot:
plot_pixel_values
, a histogram of pixel values of the metamer and target image.The plots to include are specified by including their name in the
included_plots
list. All plots can be created separately using the method with the same name.- Parameters:
metamer (
Metamer
) – Metamer object whose status we want to plot.batch_idx (
int
) – Which index to take from the batch dimensionchannel_idx (
Optional
[int
]) – Which index to take from the channel dimension. If None, we use all channels (assumed use-case is RGB(A) image).iteration (
Optional
[int
]) – Which iteration to display. If None, the default, we show the most recent one. Negative values are also allowed.ylim (
Union
[Tuple
[float
,float
],None
,Literal
[False
]]) – The ylimit to use for the representation_error plot. We pass this value directly toplot_representation_error
vrange (
Union
[Tuple
[float
,float
],str
]) – The vrange option to pass todisplay_metamer()
. See docstring ofimshow
for possible values.zoom (
Optional
[float
]) – How much to zoom in / enlarge the metamer, the ratio of display pixels to image pixels. If None (the default), we attempt to find the best value ourselves.plot_representation_error_as_rgb (bool, optional) – The representation can be image-like with multiple channels, and we have no way to determine whether it should be represented as an RGB image or not, so the user must set this flag to tell us. It will be ignored if the response doesn’t look image-like or if the model has its own plot_representation_error() method. Else, it will be passed to po.imshow(), see that methods docstring for details.
fig (
Optional
[Figure
]) – if None, we create a new figure. otherwise we assume this is an empty figure that has the appropriate size and number of subplotsaxes_idx (
Dict
[str
,int
]) – Dictionary specifying which axes contains which type of plot, allows for more fine-grained control of the resulting figure. Probably only helpful if fig is also defined. Possible keys:'display_metamer', 'plot_loss', 'plot_representation_error', 'plot_pixel_values', 'misc'
. Values should all be ints. If you tell this function to create a plot that doesn’t have a corresponding key, we find the lowest int that is not already in the dict, so if you have axes that you want unchanged, place their idx in'misc'
.figsize (
Optional
[Tuple
[float
,float
]]) – The size of the figure to create. It may take a little bit of playing around to find a reasonable value. If None, we attempt to make our best guess, aiming to have each axis be of size (5, 5)included_plots (
List
[str
]) – Which plots to include. Must be some subset of'display_metamer', 'plot_loss', 'plot_representation_error', 'plot_pixel_values'
.width_ratios (
Dict
[str
,float
]) – By default, all plots axes will have the same width. To change that, specify their relative widths using the keys:'display_metamer', 'plot_loss', 'plot_representation_error', 'plot_pixel_values'
and floats specifying their relative width. Any not included will be assumed to be 1.
- Return type:
Tuple
[Figure
,Dict
[str
,int
]]- Returns:
fig – The figure containing this plot
axes_idx – Dictionary giving index of each plot.
plenoptic.synthesize.simple_metamer module
Simple Metamer Class
- class plenoptic.synthesize.simple_metamer.SimpleMetamer(image, model)[source]
Bases:
Synthesis
Simple version of metamer synthesis.
This doesn’t have any of the bells and whistles of the full Metamer class, but does perform basic metamer synthesis: given a target image and a model, synthesize a new image (initialized with uniform noise) that has the same model output.
This is meant as a demonstration of the basic logic of synthesis.
- Parameters:
image (
Tensor
) – A 4d tensor, this is the image whose model representation we wish to match.model (
Module
) – The visual model whose representation we wish to match.
Methods
load
(file_path[, map_location])Load all relevant attributes from a .pt file.
save
(file_path)Save all relevant (non-model) variables in .pt file.
synthesize
([max_iter, optimizer])Synthesize a simple metamer.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
- load(file_path, map_location=None)[source]
Load all relevant attributes from a .pt file.
Note this operates in place and so doesn’t return anything.
- Parameters:
file_path (
str
) – The path to load the synthesis object from
- save(file_path)[source]
Save all relevant (non-model) variables in .pt file.
- Parameters:
file_path (
str
) – The path to save the SimpleMetamer object to.
- synthesize(max_iter=100, optimizer=None)[source]
Synthesize a simple metamer.
If called multiple times, will continue where we left off.
- Parameters:
max_iter (
int
) – Number of iterations to run synthesis for.optimizer (
Optional
[Optimizer
]) – The optimizer to use. If None and this is the first time calling synthesize, we use Adam(lr=.01, amsgrad=True); if synthesize has been called before, we reuse the previous optimizer.
- Returns:
The synthesized metamer
- Return type:
metamer
- to(*args, **kwargs)[source]
Move and/or cast the parameters and buffers.
This can be called as .. function:: to(device=None, dtype=None, non_blocking=False) .. function:: to(dtype, non_blocking=False) .. function:: to(tensor, non_blocking=False) Its signature is similar to
torch.Tensor.to()
, but only accepts floating point desireddtype
s. In addition, this method will only cast the floating point parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices. When calling this method to move tensors to a CUDA device, items inattrs
that start with “saved_” will not be moved. .. note:This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point type of the floating point parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- attrs (
list
): list of strs containing the attributes of this object to move to the specified device/dtype
- device (
- Returns:
Module: self
plenoptic.synthesize.synthesis module
abstract synthesis super-class.
- class plenoptic.synthesize.synthesis.OptimizedSynthesis(range_penalty_lambda=0.1, allowed_range=(0, 1))[source]
Bases:
Synthesis
Abstract super-class for synthesis objects that use optimization.
The primary difference between this and the generic Synthesis class is that these will use an optimizer object to iteratively update their output.
- Attributes:
- allowed_range
gradient_norm
Synthesis gradient’s L2 norm over iterations.
losses
Synthesis loss over iterations.
- optimizer
pixel_change_norm
L2 norm change in pixel values over iterations.
- range_penalty_lambda
- store_progress
Methods
load
(file_path[, map_location, ...])Load all relevant attributes from a .pt file.
How good is the current synthesized object.
save
(file_path[, attrs])Save all relevant (non-model) variables in .pt file.
synthesize
()Synthesize something.
to
(*args[, attrs])Moves and/or casts the parameters and buffers.
- property allowed_range
- property gradient_norm
Synthesis gradient’s L2 norm over iterations.
- property losses
Synthesis loss over iterations.
- abstract objective_function()[source]
How good is the current synthesized object.
See
plenoptic.tools.optim
for some examples.
- property optimizer
- property pixel_change_norm
L2 norm change in pixel values over iterations.
- property range_penalty_lambda
- property store_progress
- class plenoptic.synthesize.synthesis.Synthesis[source]
Bases:
ABC
Abstract super-class for synthesis objects.
All synthesis objects share a variety of similarities and thus need to have similar methods. Some of these can be implemented here and simply inherited, some of them will need to be different for each sub-class and thus are marked as abstract methods here
Methods
load
(file_path[, map_location, ...])Load all relevant attributes from a .pt file.
save
(file_path[, attrs])Save all relevant (non-model) variables in .pt file.
Synthesize something.
to
(*args[, attrs])Moves and/or casts the parameters and buffers.
- load(file_path, map_location=None, check_attributes=[], check_loss_functions=[], **pickle_load_args)[source]
Load all relevant attributes from a .pt file.
This should be called by an initialized
Synthesis
object – we will ensure that the attributes in thecheck_attributes
arg all match in the current and loaded object.Note this operates in place and so doesn’t return anything.
- Parameters:
file_path (
str
) – The path to load the synthesis object frommap_location (
Optional
[str
]) – map_location argument to pass totorch.load
. If you save stuff that was being run on a GPU and are loading onto a CPU, you’ll need this to make sure everything lines up properly. This should be structured like the str you would pass totorch.device
check_attributes (
List
[str
]) – List of strings we ensure are identical in the currentSynthesis
object and the loaded one. Checking the model is generally not recommended, since it can be hard to do (checking callable objects is hard in Python) – instead, checking thebase_representation
should ensure the model hasn’t functinoally changed.check_loss_functions (
List
[str
]) – Names of attributes that are loss functions and so must be checked specially – loss functions are callables, and it’s very difficult to check python callables for equality so, to get around that, we instead call the two versions on the same pair of tensors, and compare the outputs.pickle_load_args – any additional kwargs will be added to
pickle_module.load
viatorch.load
, see that function’s docstring for details.
- save(file_path, attrs=None)[source]
Save all relevant (non-model) variables in .pt file.
If you leave attrs as None, we grab vars(self) and exclude ‘model’. This is probably correct, but the option is provided to override it just in case
- Parameters:
file_path (str) – The path to save the synthesis object to
attrs (list or None, optional) – List of strs containing the names of the attributes of this object to save. See above for behavior if attrs is None.
- abstract to(*args, attrs=[], **kwargs)[source]
Moves and/or casts the parameters and buffers. Similar to
save
, this is an abstract method only because you need to define the attributes to call to on.This can be called as .. function:: to(device=None, dtype=None, non_blocking=False) .. function:: to(dtype, non_blocking=False) .. function:: to(tensor, non_blocking=False) Its signature is similar to
torch.Tensor.to()
, but only accepts floating point desireddtype
s. In addition, this method will only cast the floating point parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices. When calling this method to move tensors to a CUDA device, items inattrs
that start with “saved_” will not be moved. .. note:This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point type of the floating point parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- attrs (
list
): list of strs containing the attributes of this object to move to the specified device/dtype
- device (
Module contents
plenoptic.tools package
Submodules
plenoptic.tools.conv module
- plenoptic.tools.conv.blur_downsample(x, n_scales=1, filtname='binom5', scale_filter=True)[source]
Correlate with a binomial coefficient filter and downsample by 2
- Parameters:
x (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. Channels are treated in the same way as batches.
n_scales (int, optional. Should be non-negative.) – Apply the blur and downsample procedure recursively n_scales times. Default to 1.
filtname (str, optional) – Name of the filter. See pt.named_filter for options. Default to “binom5”.
scale_filter (bool, optional) – If true (default), the filter sums to 1 (ie. it does not affect the DC component of the signal). If false, the filter sums to 2.
- plenoptic.tools.conv.correlate_downsample(image, filt, padding_mode='reflect')[source]
Correlate with a filter and downsample by 2
- Parameters:
image (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. Channels are treated in the same way as batches.
filt (2-D torch.Tensor) – The filter to correlate with the input image
padding_mode (string, optional) – One of “constant”, “reflect”, “replicate”, “circular”. The option “constant” means padding with zeros.
- plenoptic.tools.conv.same_padding(x, kernel_size, stride=(1, 1), dilation=(1, 1), pad_mode='circular')[source]
Pad a tensor so that 2D convolution will result in output with same dims.
- Return type:
Tensor
- plenoptic.tools.conv.upsample_blur(x, odd, filtname='binom5', scale_filter=True)[source]
Upsample by 2 and convolve with a binomial coefficient filter
- Parameters:
x (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. Channels are treated in the same way as batches.
odd (tuple, list or numpy.ndarray) – This should contain two integers of value 0 or 1, which determines whether the output height and width should be even (0) or odd (1).
filtname (str, optional) – Name of the filter. See pt.named_filter for options. Default to “binom5”.
scale_filter (bool, optional) – If true (default), the filter sums to 4 (ie. it multiplies the signal by 4 before the blurring operation). If false, the filter sums to 2.
- plenoptic.tools.conv.upsample_convolve(image, odd, filt, padding_mode='reflect')[source]
Upsample by 2 and convolve with a filter
- Parameters:
image (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. Channels are treated in the same way as batches.
odd (tuple, list or numpy.ndarray) – This should contain two integers of value 0 or 1, which determines whether the output height and width should be even (0) or odd (1).
filt (2-D torch.Tensor) – The filter to convolve with the upsampled image
padding_mode (string, optional) – One of “constant”, “reflect”, “replicate”, “circular”. The option “constant” means padding with zeros.
plenoptic.tools.convergence module
Functions that check for optimization convergence/stabilization.
The functions herein generally differ in what they are checking for convergence: loss, pixel change, etc.
They should probably be able to accept the following arguments, in this order (they can accept more):
synth
: an OptimizedSynthesis object to check.stop_criterion
: the value used as criterion / tolerance that our convergence target is compared against.stop_iters_to_check
: how many iterations back to check for convergence.
They must return a single bool
: True
if we’ve reached convergence,
False
if not.
- plenoptic.tools.convergence.coarse_to_fine_enough(synth, i, ctf_iters_to_check)[source]
Check whether we’ve synthesized all scales and done so for at least ctf_iters_to_check iterations
This is meant to be paired with another convergence check, such as
loss_convergence
.- Parameters:
synth (
Metamer
) – The Metamer object to check.i (
int
) – The current iteration (0-indexed).ctf_iters_to_check (
int
) – Minimum number of iterations coarse-to-fine must run at each scale. If self.coarse_to_fine is False, then this is ignored.
- Returns:
Whether we’ve been doing coarse to fine synthesis for long enough.
- Return type:
ctf_enough
- plenoptic.tools.convergence.loss_convergence(synth, stop_criterion, stop_iters_to_check)[source]
Check whether the loss has stabilized and, if so, return True.
Have we been synthesizing for
stop_iters_to_check
iterations? | |- no yes
- ‘—->Is
abs(synth.loss[-1] - synth.losses[-stop_iters_to_check]) < stop_criterion
?no || yes<——-’ | | ‘——> return
True
| ‘———> returnFalse
- Parameters:
synth (
OptimizedSynthesis
) – The OptimizedSynthesis object to check.stop_criterion (
float
) – If the loss over the paststop_iters_to_check
has changed less thanstop_criterion
, we terminate synthesis.stop_iters_to_check (
int
) – How many iterations back to check in order to see if the loss has stopped decreasing (forstop_criterion
).
- Returns:
Whether the loss has stabilized or not.
- Return type:
loss_stabilized
- plenoptic.tools.convergence.pixel_change_convergence(synth, stop_criterion, stop_iters_to_check)[source]
Check whether the pixel change norm has stabilized and, if so, return True.
Have we been synthesizing for
stop_iters_to_check
iterations? | |- no yes
- ‘—->Is
(synth.pixel_change_norm[-stop_iters_to_check:] < stop_criterion).all()
?no || yes<——-’ | | ‘——> return
True
| ‘———> returnFalse
- Parameters:
synth (
OptimizedSynthesis
) – The OptimizedSynthesis object to check.stop_criterion (
float
) – If the pixel change norm has been less thanstop_criterion
for all of the paststop_iters_to_check
, we terminate synthesis.stop_iters_to_check (
int
) – How many iterations back to check in order to see if the pixel change norm has stopped decreasing (forstop_criterion
).
- Returns:
Whether the pixel change norm has stabilized or not.
- Return type:
loss_stabilized
plenoptic.tools.data module
- plenoptic.tools.data.convert_float_to_int(im, dtype=<class 'numpy.uint8'>)[source]
Convert image from float to 8 or 16 bit image
We work with float images that lie between 0 and 1, but for saving them (either as png or in a numpy array), we want to convert them to 8 or 16 bit integers. This function does that by multiplying it by the max value for the target dtype (255 for 8 bit 65535 for 16 bit) and then converting it to the proper type.
We’ll raise an exception if the max is higher than 1, in which case we have no idea what to do.
- Parameters:
im (
ndarray
) – The image to convertdtype – The target data type. {np.uint8, np.uint16}
- Returns:
The converted image, now with dtype=dtype
- Return type:
im
- plenoptic.tools.data.load_images(paths, as_gray=True)[source]
Correctly load in images
Our models and synthesis methods expect their inputs to be 4d float32 images: (batch, channel, height, width), where the batch dimension contains multiple images and channel contains something like RGB or color channel. This function helps you get your inputs into that format. It accepts either a single file, a list of files, or a single directory containing images, will load them in, normalize them to lie between 0 and 1, convert them to float32, optionally convert them to grayscale, make them tensors, and get them into the right shape.
- Parameters:
paths (
Union
[str
,List
[str
]]) – A str or list of strs. If a list, must contain paths of image files. If a str, can either be the path of a single image file or of a single directory. If a directory, we try to load every file it contains (using imageio.imwrite) and skip those we cannot (thus, for efficiency you should not point this to a directory with lots of non-image files). This is NOT recursive.as_gray (
bool
) – Whether to convert the images into grayscale or not after loading them. If False, we do nothing. If True, we call skimage.color.rgb2gray on them.
- Returns:
4d tensor containing the images.
- Return type:
images
- plenoptic.tools.data.make_synthetic_stimuli(size=256, requires_grad=True)[source]
Make a set of basic stimuli, useful for developping and debugging models
- Parameters:
size (
int
) – The stimuli will have torch.Size([size, size]).requires_grad (
bool
) – Whether to initialize the simuli with gradients.
- Returns:
Tensor of shape [11, 1, size, size]. The set of basic stiuli: [impulse, step_edge, ramp, bar, curv_edge, sine_grating, square_grating, polar_angle, angular_sine, zone_plate, fractal]
- Return type:
stimuli
- plenoptic.tools.data.polar_angle(size, phase=0.0, origin=None, device=None)[source]
Make polar angle matrix (in radians).
Compute a matrix of given size containing samples of the polar angle (in radians, CW from the X-axis, ranging from -pi to pi), relative to given phase, about the given origin pixel.
- Parameters:
size (
Union
[int
,Tuple
[int
,int
]]) – If an int, we assume the image should be of dimensions (size, size). if a tuple, must be a 2-tuple of ints specifying the dimensionsphase (
float
) – The phase of the polar angle function (in radians, clockwise from the X-axis)origin (
Union
[int
,Tuple
[float
,float
],None
]) – The center of the image. if an int, we assume the origin is at (origin, origin). if a tuple, must be a 2-tuple of ints specifying the origin (where (0, 0) is the upper left). if None, we assume the origin lies at the center of the matrix, (size+1)/2.device (
Optional
[device
]) – The device to create this tensor on.
- Returns:
The polar angle matrix
- Return type:
res
- plenoptic.tools.data.polar_radius(size, exponent=1.0, origin=None, device=None)[source]
Make distance-from-origin (r) matrix
Compute a matrix of given size containing samples of a radial ramp function, raised to given exponent, centered at given origin.
- Parameters:
size (
Union
[int
,Tuple
[int
,int
]]) – If an int, we assume the image should be of dimensions (size, size). if a tuple, must be a 2-tuple of ints specifying the dimensions.exponent (
float
) – The exponent of the radial ramp function.origin (
Union
[int
,Tuple
[int
,int
],None
]) – The center of the image. if an int, we assume the origin is at (origin, origin). if a tuple, must be a 2-tuple of ints specifying the origin (where (0, 0) is the upper left). if None, we assume the origin lies at the center of the matrix, (size+1)/2.device (
Union
[str
,device
,None
]) – The device to create this tensor on.
- Returns:
The polar radius matrix.
- Return type:
res
- plenoptic.tools.data.to_numpy(x, squeeze=False)[source]
cast tensor to numpy in the most conservative way possible
- Parameters:
x (
Union
[Tensor
,ndarray
]) – Tensor to be converted to numpy.ndarray on CPU.squeeze (
bool
) – Removes all dummy dimensions of the tensor
- Return type:
Converted tensor as numpy.ndarray on CPU.
plenoptic.tools.display module
various helpful utilities for plotting or displaying information
- plenoptic.tools.display.animshow(video, framerate=2.0, repeat=False, vrange='indep1', zoom=1, title='', col_wrap=None, ax=None, cmap=None, plot_complex='rectangular', batch_idx=None, channel_idx=None, as_rgb=False, **kwargs)[source]
Animate video(s) correctly.
This function animates videos correctly, making sure that each element in the tensor corresponds to a pixel or an integer number of pixels, to avoid aliasing (NOTE: this guarantee only holds for the saved animation (assuming video compression doesn’t interfere); it should generally hold in notebooks as well, but will fail if, e.g., your video is 2000 pixels wide on an monitor 1000 pixels wide; the notebook handles the rescaling in a way we can’t control).
This functions returns the matplotlib FuncAnimation object. In order to view it in a Jupyter notebook, use the
plenoptic.convert_anim_to_html(anim)
function. In order to save, useanim.save(filename)
(note for this that you’ll need the appropriate writer installed and on your path, e.g., ffmpeg, imagemagick, etc).- Parameters:
video (torch.Tensor or list) – The videos to display. Tensors should be 5d (batch, channel, time, height, width). List of tensors should be used for tensors of different height and width: all videos will automatically be rescaled so they’re displayed at the same height and width, thus, their heights and widths must be scalar multiples of each other. Videos must all have the same number of frames as well.
framerate (float) – Temporal resolution of the video, in Hz (frames per second).
repeat (bool) – whether to loop the animation or just play it once
vrange (tuple or str) –
If a 2-tuple, specifies the image values vmin/vmax that are mapped to the minimum and maximum value of the colormap, respectively. If a string:
- ’auto0’: all images have same vmin/vmax, which have the same absolute
value, and come from the minimum or maximum across all images, whichever has the larger absolute value
- ’auto/auto1’: all images have same vmin/vmax, which are the
minimum/maximum values across all images
- ’auto2’: all images have same vmin/vmax, which are the mean (across
all images) minus/ plus 2 std dev (across all images)
- ’auto3’: all images have same vmin/vmax, chosen so as to map the
10th/90th percentile values to the 10th/90th percentile of the display intensity range. For example: vmin is the 10th percentile image value minus 1/8 times the difference between the 90th and 10th percentile
- ’indep0’: each image has an independent vmin/vmax, which have the
same absolute value, which comes from either their minimum or maximum value, whichever has the larger absolute value.
- ’indep1’: each image has an independent vmin/vmax, which are their
minimum/maximum values
- ’indep2’: each image has an independent vmin/vmax, which is their
mean minus/plus 2 std dev
- ’indep3’: each image has an independent vmin/vmax, chosen so that
the 10th/90th percentile values map to the 10th/90th percentile intensities.
zoom (float) – ratio of display pixels to image pixels. if >1, must be an integer. If <1, must be 1/d where d is a a divisor of the size of the largest image.
title (str, list, or None, optional) –
Title for the plot. In addition to the specified title, we add a subtitle giving the plotted range and dimensionality (with zoom) * if str, will put the same title on every plot. * if list, all values must be str, must be the same length as img,
assigning each title to corresponding image.
if None, no title will be printed (and subtitle will be removed).
col_wrap (int or None, optional) – number of axes to have in each row. If None, will fit all axes in a single row.
ax (matplotlib.pyplot.axis or None, optional) – if None, we make the appropriate figure. otherwise, we resize the axes so that it’s the appropriate number of pixels (done by shrinking the bbox - if the bbox is already too small, this will throw an Exception!, so first define a large enough figure using either pyrtools.make_figure or plt.figure)
cmap (matplotlib colormap, optional) – colormap to use when showing these images
plot_complex ({'rectangular', 'polar', 'logpolar'}) –
specifies handling of complex values.
’rectangular’: plot real and imaginary components as separate images
’polar’: plot amplitude and phase as separate images
’logpolar’: plot log_2 amplitude and phase as separate images
for any other value, we raise a warning and default to rectangular.
batch_idx (int or None, optional) – Which element from the batch dimension to plot. If None, we plot all.
channel_idx (int or None, optional) – Which element from the channel dimension to plot. If None, we plot all. Note if this is an int, then as_rgb=True will fail, because we restrict the channels.
as_rgb (bool, optional) – Whether to consider the channels as encoding RGB(A) values. If True, we attempt to plot the image in color, so your tensor must have 3 (or 4 if you want the alpha channel) elements in the channel dimension, or this will raise an Exception. If False, we plot each channel as a separate grayscale image.
kwargs – Passed to ax.imshow
- Returns:
anim – The animation object. In order to view, must convert to HTML or save.
- Return type:
matplotlib.animation.FuncAnimation
Notes
By default, we use the ffmpeg backend, which requires that you have ffmpeg installed and on your path (https://ffmpeg.org/download.html). To use a different, use the matplotlib rcParams: matplotlib.rcParams[‘animation.writer’] = writer, see https://matplotlib.org/stable/api/animation_api.html#writer-classes for more details.
For displaying in a jupyter notebook, ffmpeg appears to be required.
- plenoptic.tools.display.clean_stem_plot(data, ax=None, title='', ylim=None, xvals=None, **kwargs)[source]
convenience wrapper for plotting stem plots
This plots the data, baseline, cleans up the axis, and sets the title
Should not be called by users directly, but is a helper function for the various plot_representation() functions
By default, stem plot would have a baseline that covers the entire range of the data. We want to be able to break that up visually (so there’s a line from 0 to 9, from 10 to 19, etc), and passing xvals separately allows us to do that. If you want the default stem plot behavior, leave xvals as None.
- Parameters:
data (np.ndarray) – The data to plot (as a stem plot)
ax (matplotlib.pyplot.axis or None, optional) – The axis to plot the data on. If None, we plot on the current axis
title (str or None, optional) – The title to put on the axis if not None. If None, we don’t call
ax.set_title
(useful if you want to avoid changing the title on an existing plot)ylim (tuple or None, optional) – If not None, the y-limits to use for this plot. If None, we use the default, slightly adjusted so that the minimum is 0. If False, do not change y-limits.
xvals (tuple or None, optional) – A 2-tuple of lists, containing the start (
xvals[0]
) and stop (xvals[1]
) x values for plotting. If None, we use the default stem plot behavior.kwargs – passed to ax.stem
- Returns:
ax – The axis with the plot
- Return type:
matplotlib.pyplot.axis
Examples
We allow for breaks in the baseline value if we want to visually break up the plot, as we see below.
- ..plot::
- include-source:
import plenoptic as po import numpy as np import matplotlib.pyplot as plt # if ylim=None, as in this example, the minimum y-valuewill get # set to 0, so we want to make sure our values are all positive y = np.abs(np.random.randn(55)) y[15:20] = np.nan y[35:40] = np.nan # we want to draw the baseline from 0 to 14, 20 to 34, and 40 to # 54, everywhere that we have non-NaN values for y xvals = ([0, 20, 40], [14, 34, 54]) po.tools.display.clean_stem_plot(y, xvals=xvals) plt.show()
If we don’t care about breaking up the x-axis, you can simply use the default xvals (
None
). In this case, this function will just clean up the plot a little bit- ..plot::
- include-source:
import plenoptic as po import numpy as np import matplotlib.pyplot as plt # if ylim=None, as in this example, the minimum y-valuewill get # set to 0, so we want to make sure our values are all positive y = np.abs(np.random.randn(55)) po.tools.display.clean_stem_plot(y) plt.show()
- plenoptic.tools.display.clean_up_axes(ax, ylim=None, spines_to_remove=['top', 'right', 'bottom'], axes_to_remove=['x'])[source]
Clean up an axis, as desired when making a stem plot of the representation
- Parameters:
ax (matplotlib.pyplot.axis) – The axis to clean up.
ylim (tuple, False, or None) – If a tuple, the y-limits to use for this plot. If None, we use the default, slightly adjusted so that the minimum is 0. If False, we do nothing.
spines_to_remove (list) – Some combination of ‘top’, ‘right’, ‘bottom’, and ‘left’. The spines we remove from the axis.
axes_to_remove (list) – Some combination of ‘x’, ‘y’. The axes to set as invisible.
- Returns:
ax – The cleaned-up axis
- Return type:
matplotlib.pyplot.axis
- plenoptic.tools.display.convert_anim_to_html(anim)[source]
convert a matplotlib animation object to HTML (for display)
This is a simple little wrapper function that allows the animation to be displayed in a Jupyter notebook
- Parameters:
anim (matplotlib.animation.FuncAnimation) – The animation object to convert to HTML
- plenoptic.tools.display.imshow(image, vrange='indep1', zoom=None, title='', col_wrap=None, ax=None, cmap=None, plot_complex='rectangular', batch_idx=None, channel_idx=None, as_rgb=False, **kwargs)[source]
Show image(s) correctly.
This function shows images correctly, making sure that each element in the tensor corresponds to a pixel or an integer number of pixels, to avoid aliasing (NOTE: this guarantee only holds for the saved image; it should generally hold in notebooks as well, but will fail if, e.g., you plot an image that’s 2000 pixels wide on an monitor 1000 pixels wide; the notebook handles the rescaling in a way we can’t control).
- Parameters:
image (torch.Tensor or list) – The images to display. Tensors should be 4d (batch, channel, height, width). List of tensors should be used for tensors of different height and width: all images will automatically be rescaled so they’re displayed at the same height and width, thus, their heights and widths must be scalar multiples of each other.
vrange (tuple or str) –
If a 2-tuple, specifies the image values vmin/vmax that are mapped to the minimum and maximum value of the colormap, respectively. If a string:
- ’auto0’: all images have same vmin/vmax, which have the same absolute
value, and come from the minimum or maximum across all images, whichever has the larger absolute value
- ’auto/auto1’: all images have same vmin/vmax, which are the
minimum/maximum values across all images
- ’auto2’: all images have same vmin/vmax, which are the mean (across
all images) minus/ plus 2 std dev (across all images)
- ’auto3’: all images have same vmin/vmax, chosen so as to map the
10th/90th percentile values to the 10th/90th percentile of the display intensity range. For example: vmin is the 10th percentile image value minus 1/8 times the difference between the 90th and 10th percentile
- ’indep0’: each image has an independent vmin/vmax, which have the
same absolute value, which comes from either their minimum or maximum value, whichever has the larger absolute value.
- ’indep1’: each image has an independent vmin/vmax, which are their
minimum/maximum values
- ’indep2’: each image has an independent vmin/vmax, which is their
mean minus/plus 2 std dev
- ’indep3’: each image has an independent vmin/vmax, chosen so that
the 10th/90th percentile values map to the 10th/90th percentile intensities.
zoom (float or None) – ratio of display pixels to image pixels. if >1, must be an integer. If <1, must be 1/d where d is a a divisor of the size of the largest image. If None, we try to determine the best zoom.
title (str, list, or None, optional) –
Title for the plot. In addition to the specified title, we add a subtitle giving the plotted range and dimensionality (with zoom) * if str, will put the same title on every plot. * if list, all values must be str, must be the same length as img,
assigning each title to corresponding image.
if None, no title will be printed (and subtitle will be removed).
col_wrap (int or None, optional) – number of axes to have in each row. If None, will fit all axes in a single row.
ax (matplotlib.pyplot.axis or None, optional) – if None, we make the appropriate figure. otherwise, we resize the axes so that it’s the appropriate number of pixels (done by shrinking the bbox - if the bbox is already too small, this will throw an Exception!, so first define a large enough figure using either make_figure or plt.figure)
cmap (matplotlib colormap, optional) – colormap to use when showing these images
plot_complex ({'rectangular', 'polar', 'logpolar'}) –
specifies handling of complex values.
’rectangular’: plot real and imaginary components as separate images
’polar’: plot amplitude and phase as separate images
’logpolar’: plot log_2 amplitude and phase as separate images
for any other value, we raise a warning and default to rectangular.
batch_idx (int or None, optional) – Which element from the batch dimension to plot. If None, we plot all.
channel_idx (int or None, optional) – Which element from the channel dimension to plot. If None, we plot all. Note if this is an int, then as_rgb=True will fail, because we restrict the channels.
as_rgb (bool, optional) – Whether to consider the channels as encoding RGB(A) values. If True, we attempt to plot the image in color, so your tensor must have 3 (or 4 if you want the alpha channel) elements in the channel dimension, or this will raise an Exception. If False, we plot each channel as a separate grayscale image.
kwargs – Passed to ax.imshow
- Returns:
fig – figure containing the plotted images
- Return type:
PyrFigure
- plenoptic.tools.display.plot_representation(model=None, data=None, ax=None, figsize=(5, 5), ylim=False, batch_idx=0, title='', as_rgb=False)[source]
Helper function for plotting model representation
We are trying to plot
data
onax
, usingmodel.plot_representation
method, if it has it, and otherwise default to a function that makes sense based on the shape ofdata
.All of these arguments are optional, but at least some of them need to be set:
If
model
isNone
, we fall-back to a type of plot based on the shape ofdata
. If it looks image-like, we’ll useplenoptic.imshow
and if it looks vector-like, we’ll useplenoptic.clean_stem_plot
. If it’s a dictionary, we’ll assume each key, value pair gives the title and data to plot on a separate sub-plot.If
data
isNone
, we can only do something ifmodel.plot_representation
has some default behavior whendata=None
; this is probably to plot its ownrepresentation
attribute. Thus, this will raise an Exception if bothmodel
anddata
areNone
, because we have no idea what to plot then.If
ax
isNone
, we create a one-subplot figure usingfigsize
. Ifax
is notNone
, we therefore ignorefigsize
.If
ylim
isNone
, we callrescale_ylim
, which sets the axes’ y-limits to be(-y_max, y_max)
, wherey_max=np.abs(data).max()
. If it’sFalse
, we do nothing.
- Parameters:
model (torch.nn.Module or None, optional) – A differentiable model that tells us how to plot
data
. See above for behavior ifNone
.data (array_like, dict, or None, optional) – The data to plot. See above for behavior if
None
.ax (matplotlib.pyplot.axis or None, optional) – The axis to plot on. See above for behavior if
None
.figsize (tuple, optional) – The size of the figure to create. Ignored if
ax
is notNone
.ylim (tuple, None, or False, optional) – If not None, the y-limits to use for this plot. See above for behavior if
None
. If False, we do nothing.batch_idx (int, optional) – Which index to take from the batch dimension
title (str, optional) – The title to put above this axis. If you want no title, pass the empty string (
''
)as_rgb (bool, optional) – The representation can be image-like with multiple channels, and we have no way to determine whether it should be represented as an RGB image or not, so the user must set this flag to tell us. It will be ignored if the representation doesn’t look image-like or if the model has its own plot_representation_error() method. Else, it will be passed to po.imshow(), see that methods docstring for details.
- Returns:
axes – List of created axes.
- Return type:
list
- plenoptic.tools.display.pyrshow(pyr_coeffs, vrange='indep1', zoom=1, show_residuals=True, cmap=None, plot_complex='rectangular', batch_idx=0, channel_idx=0, **kwargs)[source]
Display steerable pyramid coefficients in orderly fashion.
This function uses
imshow
to show the coefficients of the steeable pyramid, such that each scale shows up on a single row, with each scale in a given column.Note that unlike imshow, we can only show one batch or channel at a time
- Parameters:
pyr_coeffs (dict) – pyramid coefficients in the standard dictionary format as returned by
SteerablePyramidFreq.forward()
vrange (tuple or str) –
If a 2-tuple, specifies the image values vmin/vmax that are mapped to the minimum and maximum value of the colormap, respectively. If a string:
- ’auto0’: all images have same vmin/vmax, which have the same absolute
value, and come from the minimum or maximum across all images, whichever has the larger absolute value
- ’auto/auto1’: all images have same vmin/vmax, which are the
minimum/maximum values across all images
- ’auto2’: all images have same vmin/vmax, which are the mean (across
all images) minus/ plus 2 std dev (across all images)
- ’auto3’: all images have same vmin/vmax, chosen so as to map the
10th/90th percentile values to the 10th/90th percentile of the display intensity range. For example: vmin is the 10th percentile image value minus 1/8 times the difference between the 90th and 10th percentile
- ’indep0’: each image has an independent vmin/vmax, which have the
same absolute value, which comes from either their minimum or maximum value, whichever has the larger absolute value.
- ’indep1’: each image has an independent vmin/vmax, which are their
minimum/maximum values
- ’indep2’: each image has an independent vmin/vmax, which is their
mean minus/plus 2 std dev
- ’indep3’: each image has an independent vmin/vmax, chosen so that
the 10th/90th percentile values map to the 10th/90th percentile intensities.
zoom (float) – ratio of display pixels to image pixels. if >1, must be an integer. If <1, must be 1/d where d is a a divisor of the size of the largest image.
show_residuals (bool) – whether to display the residual bands (lowpass, highpass depending on the pyramid type)
cmap (matplotlib colormap, optional) – colormap to use when showing these images
plot_complex ({'rectangular', 'polar', 'logpolar'}) –
specifies handling of complex values.
’rectangular’: plot real and imaginary components as separate images
’polar’: plot amplitude and phase as separate images
’logpolar’: plot log_2 amplitude and phase as separate images
for any other value, we raise a warning and default to rectangular.
batch_idx (int, optional) – Which element from the batch dimension to plot.
channel_idx (int, optional) – Which element from the channel dimension to plot.
kwargs – Passed on to
pyrtools.pyrshow
- Returns:
fig – the figure displaying the coefficients.
- Return type:
PyrFigure
- plenoptic.tools.display.rescale_ylim(axes, data)[source]
rescale y-limits nicely
We take the axes and set their limits to be
(-y_max, y_max)
, wherey_max=np.abs(data).max()
- Parameters:
axes (list) – A list of matplotlib axes to rescale
data (array_like or dict) – The data to use when rescaling (or a dictiontary of those values)
- plenoptic.tools.display.update_plot(axes, data, model=None, batch_idx=0)[source]
Update the information in some axes.
This is used for creating an animation over time. In order to create the animation, we need to know how to update the matplotlib Artists, and this provides a simple way of doing that. It assumes the plot has been created by something like
plot_representation
, which initializes all the artists.We can update stem plots, lines (as returned by
plt.plot
), scatter plots, or images (RGB, RGBA, or grayscale).There are two modes for this:
single axis: axes is a single axis, which may contain multiple artists (all of the same type) to update. data should be a Tensor with multiple channels (one per artist in the same order) or be a dictionary whose keys give the label(s) of the corresponding artist(s) and whose values are Tensors.
multiple axes: axes is a list of axes, each of which contains a single artist to update (artists can be different types). data should be a Tensor with multiple channels (one per axis in the same order) or a dictionary with the same number of keys as axes, which we can iterate through in order, and whose values are Tensors.
In all cases, data Tensors should be 3d (if the plot we’re updating is a line or stem plot) or 4d (if it’s an image or scatter plot).
RGB(A) images are special, since we store that info along the channel dimension, so they only work with single-axis mode (which will only have a single artist, because that’s how imshow works).
If you have multiple axes, each with multiple artists you want to update, that’s too complicated for us, and so you should write a
model.update_plot()
function which handles that.If
model
is set, we try to callmodel.update_plot()
(which must also return artists). If model doesn’t have anupdate_plot
method, then we try to figure out how to update the axes ourselves, based on the shape of the data.- Parameters:
axes (list or matplotlib.pyplot.axis) – The axis or list of axes to update. We assume that these are the axes created by
plot_representation
and so contain stem plots in the correct order.data (torch.Tensor or dict) – The new data to plot.
model (torch.nn.Module or None, optional) – A differentiable model that tells us how to plot
data
. See above for behavior ifNone
.batch_idx (int, optional) – Which index to take from the batch dimension
- Returns:
artists – A list of the artists used to update the information on the plots
- Return type:
list
- plenoptic.tools.display.update_stem(stem_container, ydata)[source]
Update the information in a stem plot
We update the information in a single stem plot to match that given by
ydata
. We update the position of the markers and and the lines connecting them to the baseline, but we don’t change the baseline at all and assume that the xdata shouldn’t change at all.- Parameters:
stem_container (matplotlib.container.StemContainer) – Single container for the artists created in a
plt.stem
plot. It can be treated like a namedtuple(markerline, stemlines, baseline)
. In order to get this from an axisax
, tryax.containers[0]
(obviously if you have more than one container in that axis, it may not be the first one).ydata (array_like) – The new y-data to show on the plot. Importantly, must be the same length as the existing y-data.
- Returns:
stem_container – The StemContainer containing the updated artists.
- Return type:
matplotlib.container.StemContainer
plenoptic.tools.external module
tools to deal with data from outside plenoptic
For example, pre-existing synthesized images
- plenoptic.tools.external.plot_MAD_results(original_image, noise_levels=None, results_dir=None, ssim_images_dir=None, zoom=3, vrange='indep1', **kwargs)[source]
plot original MAD results, provided by Zhou Wang
Plot the results of original MAD Competition, as provided in .mat files. The figure created shows the results for one reference image and multiple noise levels. The reference image is plotted on the first row, followed by a separate row for each noise level, which will show the initial (noisy) image and the four synthesized images, with their respective losses for the two metrics (MSE and SSIM).
We also return a DataFrame that contains the losses, noise levels, and original image name for each plotted noise level.
This code can probably be adapted to other uses, but requires that all images are the same size and assumes they’re all 64 x 64 pixels.
- Parameters:
original_image ({samp1, samp2, samp3, samp4, samp5, samp6, samp7,) – samp8, samp9, samp10} which of the sample images to plot
noise_levels (list or None, optional) – which noise levels to plot. if None, will plot all. If a list, elements must be 2**i where i is in [1, 10]
results_dir (None or str, optional) – path to the results directory containing the results.mat files. If None, we call po.data.fetch_data to download (requires optional dependency pooch).
ssim_images_dir (None or str, optional) – path to the directory containing the .tif images used in SSIM paper. If None, we call po.data.fetch_data to download (requires optional dependency pooch).
zoom (int, optional) – amount to zoom each image, passed to pyrtools.imshow
vrange (str, optional) – in addition to the values accepted by pyrtools.imshow, we also accept ‘row0/1/2/3’, which is the same as ‘auto0/1/2/3’, except that we do it on a per-row basis (all images with same noise level)
kwargs – passed to pyrtools.imshow. Note that we call imshow separately on each image and so any argument that relies on imshow having access to all images will probably not work as expected
- Returns:
fig (pyrtools.tools.display.Figure) – figure containing the images
results (dict) – dictionary containing the errors for each noise level. To convert to a well-structured pandas DataFrame, run
pd.DataFrame(results).T
plenoptic.tools.optim module
Tools related to optimization such as more objective functions.
- plenoptic.tools.optim.l2_norm(synth_rep, ref_rep, **kwargs)[source]
l2-norm of the difference between ref_rep and synth_rep
- Parameters:
synth_rep (
Tensor
) – The first tensor to compare, model representation of the synthesized image.ref_rep (
Tensor
) – The second tensor to compare, model representation of the reference image. must be same size assynth_rep
.kwargs – Ignored, only present to absorb extra arguments.
- Returns:
The L2-norm of the difference between
ref_rep
andsynth_rep
.- Return type:
loss
- plenoptic.tools.optim.mse(synth_rep, ref_rep, **kwargs)[source]
return the MSE between synth_rep and ref_rep
For two tensors, \(x\) and \(y\), with \(n\) values each:
\[MSE &= \frac{1}{n}\sum_i=1^n (x_i - y_i)^2\]The two images must have a float dtype
- Parameters:
synth_rep (
Tensor
) – The first tensor to compare, model representation of the synthesized imageref_rep (
Tensor
) – The second tensor to compare, model representation of the reference image. must be same size assynth_rep
,kwargs – Ignored, only present to absorb extra arguments
- Returns:
The mean-squared error between
synth_rep
andref_rep
- Return type:
loss
- plenoptic.tools.optim.penalize_range(synth_img, allowed_range=(0.0, 1.0), **kwargs)[source]
penalize values outside of allowed_range
instead of clamping values to exactly fall in a range, this provides a ‘softer’ way of doing it, by imposing a quadratic penalty on any values outside the allowed_range. All values within the allowed_range have a penalty of 0
- Parameters:
synth_img (
Tensor
) – The tensor to penalize. the synthesized image.allowed_range (
Tuple
[float
,float
]) – 2-tuple of values giving the (min, max) allowed valueskwargs – Ignored, only present to absorb extra arguments
- Returns:
Penalty for values outside range
- Return type:
penalty
- plenoptic.tools.optim.relative_MSE(synth_rep, ref_rep, **kwargs)[source]
Squared l2-norm of the difference between reference representation and synthesized representation relative to the squared l2-norm of the reference representation:
$$frac{||x - hat{x}||_2^2}{||x||_2^2}$$
- Parameters:
synth_rep (
Tensor
) – The first tensor to compare, model representation of the synthesized image.ref_rep (
Tensor
) – The second tensor to compare, model representation of the reference image. must be same size assynth_rep
.kwargs – Ignored, only present to absorb extra arguments
- Returns:
Ratio of the squared l2-norm of the difference between
ref_rep
andsynth_rep
to the squared l2-norm ofref_rep
- Return type:
loss
plenoptic.tools.signal module
- plenoptic.tools.signal.add_noise(img, noise_mse)[source]
Add normally distributed noise to an image
This adds normally-distributed noise to an image so that the resulting noisy version has the specified mean-squared error.
- Parameters:
img (
Tensor
) – The image to make noisy.noise_mse (
Union
[float
,List
[float
]]) – The target MSE value / variance of the noise. More than one value is allowed.
- Returns:
The noisy image. If noise_mse contains only one element, this will be the same size as img. Else, each separate value from noise_mse will be along the batch dimension.
- Return type:
noisy_img
- plenoptic.tools.signal.autocorrelation(x)[source]
Compute the autocorrelation of x.
- Parameters:
x (
Tensor
) – N-dimensional tensor. We assume the last two dimension are height and width and compute you autocorrelation on these dimensions (independently on each other dimension).- Returns:
Autocorrelation of x
- Return type:
ac
Notes
By the Einstein-Wiener-Khinchin theorem: The autocorrelation of a wide sense stationary (WSS) process is the inverse Fourier transform of its energy spectrum (ESD) - which itself is the multiplication between FT(x(t)) and FT(x(-t)). In other words, the auto-correlation is convolution of the signal x with itself, which corresponds to squaring in the frequency domain. This approach is computationally more efficient than brute force (n log(n) vs n^2).
By Cauchy-Swartz, the autocorrelation attains it is maximum at the center location (ie. no shift) - that maximum value is the signal’s variance (assuming that the input signal is mean centered).
- plenoptic.tools.signal.center_crop(x, output_size)[source]
Crop out the center of a signal.
If x has an even number of elements on either of those final two dimensions, we round up.
- Parameters:
x (
Tensor
) – N-dimensional tensor, we assume the last two dimensions are height and width.output_size (
int
) – The size of the output. Note that we only support a single number, so both dimensions are cropped identically
- Returns:
Tensor whose last two dimensions have each been cropped to
output_size
- Return type:
cropped
- plenoptic.tools.signal.expand(x, factor)[source]
Expand a signal by a factor.
We do this in the frequency domain: pasting the Fourier contents of
x
in the center of a larger empty tensor, and then taking the inverse FFT.- Parameters:
x (
Tensor
) – The signal for expansion.factor (
float
) – Factor by which to resize image. Must be larger than 1 and factor * x.shape[-2:] must give integer values
- Returns:
The expanded signal
- Return type:
expanded
See also
shrink
The inverse operation
- plenoptic.tools.signal.interpolate1d(x_new, Y, X)[source]
One-dimensional linear interpolation.
Returns the one-dimensional piecewise linear interpolant to a function with given discrete data points (X, Y), evaluated at x_new.
Note: this function is just a wrapper around
np.interp()
.- Parameters:
x_new (
Tensor
) – The x-coordinates at which to evaluate the interpolated values.Y (
Union
[Tensor
,ndarray
]) – The y-coordinates of the data points.X (
Union
[Tensor
,ndarray
]) – The x-coordinates of the data points, same length as X.
- Return type:
Interpolated values of shape identical to x_new.
- plenoptic.tools.signal.make_disk(img_size, outer_radius=None, inner_radius=None)[source]
Create a circular mask with softened edges to an image.
All values within
inner_radius
will be 1, and all values frominner_radius
toouter_radius
will decay smoothly to 0.- Parameters:
img_size (
Union
[int
,Tuple
[int
,int
],Size
]) – Size of image in pixels.outer_radius (
Optional
[float
]) – Total radius of disk. Values frominner_radius
toouter_radius
will decay smoothly to zero.inner_radius (
Optional
[float
]) – Radius of inner disk. All elements from the origin toinner_radius
will be set to 1.
- Returns:
Tensor mask with torch.Size(img_size).
- Return type:
mask
- plenoptic.tools.signal.maximum(x, dim=None, keepdim=False)[source]
Compute maximum in torch over any dim or combination of axes in tensor.
- Parameters:
x (
Tensor
) – Input tensordim (
Optional
[List
[int
]]) – Dimensions over which you would like to compute the minimumkeepdim (
bool
) – Keep original dimensions of tensor when returning result
- Returns:
Maximum value of x.
- Return type:
max_x
- plenoptic.tools.signal.minimum(x, dim=None, keepdim=False)[source]
Compute minimum in torch over any axis or combination of axes in tensor.
- Parameters:
x (
Tensor
) – Input tensor.dim (
Optional
[List
[int
]]) – Dimensions over which you would like to compute the minimum.keepdim (
bool
) – Keep original dimensions of tensor when returning result.
- Returns:
Minimum value of x.
- Return type:
min_x
- plenoptic.tools.signal.modulate_phase(x, phase_factor=2.0)[source]
Modulate the phase of a complex signal.
Doubling the phase of a complex signal allows you to, for example, take the correlation between steerable pyramid coefficients at two adjacent spatial scales.
- Parameters:
x (
Tensor
) – Complex tensor whose phase will be modulated.phase_factor (
float
) – Multiplicative factor to change phase by.
- Returns:
Phase-modulated complex tensor.
- Return type:
x_mod
- plenoptic.tools.signal.polar_to_rectangular(amplitude, phase)[source]
Polar to rectangular coordinate transform
- Parameters:
amplitude (
Tensor
) – Tensor containing the amplitude (aka. complex modulus). Must be > 0.phase (
Tensor
) – Tensor containing the phase
- Return type:
Complex tensor.
- plenoptic.tools.signal.raised_cosine(width=1, position=0, values=(0, 1))[source]
Return a lookup table containing a “raised cosine” soft threshold function.
- Y = VALUES(1)
(VALUES(2)-VALUES(1))
cos^2( PI/2 * (X - POSITION + WIDTH)/WIDTH )
This lookup table is suitable for use by interpolate1d
- Parameters:
width (
float
) – The width of the region over which the transition occurs.position (
float
) – The location of the center of the threshold.values (
Tuple
[float
,float
]) – 2-tuple specifying the values to the left and right of the transition.
- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
X – The x values of this raised cosine.
Y – The y values of this raised cosine.
- plenoptic.tools.signal.rectangular_to_polar(x)[source]
Rectangular to polar coordinate transform
- Parameters:
x (
Tensor
) – Complex tensor.- Return type:
Tuple
[Tensor
,Tensor
]- Returns:
amplitude – Tensor containing the amplitude (aka. complex modulus).
phase – Tensor containing the phase.
- plenoptic.tools.signal.rescale(x, a=0.0, b=1.0)[source]
Linearly rescale the dynamic range of the input x to [a,b].
- Return type:
Tensor
- plenoptic.tools.signal.shrink(x, factor)[source]
Shrink a signal by a factor.
We do this in the frequency domain: cropping out the center of the Fourier transform of
x
, putting it in a new tensor, and taking the IFFT.- Parameters:
x (
Tensor
) – The signal for expansion.factor (
int
) – Factor by which to resize image. Must be larger than 1 and factor / x.shape[-2:] must give integer values
- Returns:
The expanded signal
- Return type:
expanded
See also
expand
The inverse operation
- plenoptic.tools.signal.steer(basis, angle, harmonics=None, steermtx=None, return_weights=False, even_phase=True)[source]
Steer BASIS to the specfied ANGLE.
- Parameters:
basis (
Tensor
) – Array whose columns are vectorized rotated copies of a steerable function, or the responses of a set of steerable filters.angle (
Union
[ndarray
,Tensor
,float
]) – Scalar or column vector the size of the basis. specifies the angle(s) (in radians) to steer toharmonics (
Optional
[List
[int
]]) – A list of harmonic numbers indicating the angular harmonic content of the basis. if None (default), N even or odd low frequencies, as for derivative filterssteermtx (
Union
[Tensor
,ndarray
,None
]) – Matrix which maps the filters onto Fourier series components (ordered [cos0 cos1 sin1 cos2 sin2 … sinN]). See steer_to_harmonics_mtx function for more details. If None (default), assumes cosine phase harmonic components, and filter positions at 2pi*n/N.return_weights (
bool
) – Whether to return the weights or not.even_phase (
bool
) – Specifies whether the harmonics are cosine or sine phase aligned about those positions.
- Returns:
res – The resteered basis.
steervect – The weights used to resteer the basis. only returned if
return_weights
is True.
plenoptic.tools.stats module
- plenoptic.tools.stats.kurtosis(x, mean=None, var=None, dim=None, keepdim=False)[source]
sample estimate of x tailedness (presence of outliers)
kurtosis of univariate noral is 3.
smaller than 3: platykurtic (eg. uniform distribution)
greater than 3: leptokurtic (eg. Laplace distribution)
- Parameters:
x (
Tensor
) – The input tensor.mean (
Union
[float
,Tensor
,None
]) – Reuse a precomputed mean.var (
Union
[float
,Tensor
,None
]) – Reuse a precomputed variance.dim (
Union
[int
,List
[int
],None
]) – The dimension or dimensions to reduce.keepdim (
bool
) – Whether the output tensor has dim retained or not.
- Returns:
The kurtosis tensor.
- Return type:
out
- plenoptic.tools.stats.skew(x, mean=None, var=None, dim=None, keepdim=False)[source]
Sample estimate of x asymmetry about its mean
- Parameters:
x (
Tensor
) – The input tensormean (
Union
[float
,Tensor
,None
]) – Reuse a precomputed meanvar (
Union
[float
,Tensor
,None
]) – Reuse a precomputed variancedim (
Union
[int
,List
[int
],None
]) – The dimension or dimensions to reduce.keepdim (
bool
) – Whether the output tensor has dim retained or not.
- Returns:
The skewness tensor.
- Return type:
out
- plenoptic.tools.stats.variance(x, mean=None, dim=None, keepdim=False)[source]
Calculate sample variance.
Note that this is the uncorrected, or sample, variance, corresponding to
torch.var(*, correction=0)
- Parameters:
x (
Tensor
) – The input tensormean (
Union
[float
,Tensor
,None
]) – Reuse a precomputed meandim (
Union
[int
,List
[int
],None
]) – The dimension or dimensions to reduce.keepdim (
bool
) – Whether the output tensor has dim retained or not.
- Returns:
The variance tensor.
- Return type:
out
plenoptic.tools.straightness module
- plenoptic.tools.straightness.deviation_from_line(sequence, normalize=True)[source]
Compute the deviation of sequence to the straight line between its endpoints.
Project each point of the path sequence onto the line defined by the anchor points, and measure the two sides of a right triangle: - from the projected point to the first anchor point
(aka. distance along line)
from the projected point to the corresponding point on the path sequence (aka. distance from line).
- Parameters:
sequence (
Tensor
) – sequence of signals of shape (T, channel, height, width)normalize (
bool
) – use the distance between the anchor points as a unit of measurement
- Return type:
Tuple
[Tensor
,Tensor
]- Returns:
dist_along_line – sequence of T euclidian distances along the line
dist_from_line – sequence of T euclidian distances to the line
- plenoptic.tools.straightness.make_straight_line(start, stop, n_steps)[source]
make a straight line between start and stop with n_steps transitions.
- Parameters:
start (
Tensor
) – Images of shape (1, channel, height, width), the anchor points between which a line will be made.stop (
Tensor
) – Images of shape (1, channel, height, width), the anchor points between which a line will be made.n_steps (
int
) – Number of steps (i.e., transitions) to create between the two anchor points. Must be positive.
- Returns:
Tensor of shape (n_steps+1, channel, height, width)
- Return type:
straight
- plenoptic.tools.straightness.sample_brownian_bridge(start, stop, n_steps, max_norm=1)[source]
Sample a brownian bridge between start and stop made up of n_steps
- Parameters:
start (
Tensor
) – signal of shape (1, channel, height, width), the anchor points between which a random path will be sampled (like pylons on which the bridge will rest)stop (
Tensor
) – signal of shape (1, channel, height, width), the anchor points between which a random path will be sampled (like pylons on which the bridge will rest)n_steps (
int
) – number of steps on the bridgemax_norm (
float
) – controls variability of the bridge by setting how far (in l2 norm) it veers from the straight line interpolation at the midpoint between pylons. each component of the bridge will reach a maximal variability with std = max_norm / sqrt(d), where d is the dimension of the signal. (ie. d = C*H*W). Must be non-negative.
- Returns:
sequence of shape (n_steps+1, channel, height, width) a brownian bridge across the two pylons
- Return type:
bridge
- plenoptic.tools.straightness.translation_sequence(image, n_steps=10)[source]
make a horizontal translation sequence on image
- Parameters:
image (
Tensor
) – Base image of shape, (1, channel, height, width)n_steps (
int
) – Number of steps in the sequence. The length of the sequence is n_steps + 1. Must be positive.
- Returns:
translation sequence of shape (n_steps+1, channel, height, width)
- Return type:
sequence
plenoptic.tools.validate module
Functions to validate synthesis inputs.
- plenoptic.tools.validate.remove_grad(model)[source]
Detach all parameters and buffers of model (in place).
- plenoptic.tools.validate.validate_coarse_to_fine(model, image_shape=None, device='cpu')[source]
Determine whether a model can be used for coarse-to-fine synthesis.
In particular, this function checks the following (with associated errors):
Whether
model
has ascales
attribute (AttributeError
).Whether
model.forward
accepts ascales
keyword argument (TypeError
).Whether the output of
model.forward
changes shape when thescales
keyword argument is set (ValueError
).
- Parameters:
model (
Module
) – The model to validate.image_shape (
Optional
[Tuple
[int
,int
,int
,int
]]) – Some models (e.g., the steerable pyramid) can only accept inputs of a certain shape. If that’s the case formodel
, use this to specify the expected shape. If None, we use an image of shape (1,1,16,16)device (
Union
[str
,device
]) – Which device to place the test image on.
- plenoptic.tools.validate.validate_input(input_tensor, no_batch=False, allowed_range=None)[source]
Determine whether input_tensor tensor can be used for synthesis.
In particular, this function:
Checks if input_tensor has a float or complex dtype
Checks if input_tensor is 4d.
If
no_batch
is True, check whetherinput_tensor.shape[0] != 1
If
allowed_range
is not None, check whether all values ofinput_tensor
lie within the specified range.
If any of the above fail, a
ValueError
is raised.- Parameters:
input_tensor (
Tensor
) – The tensor to validate.no_batch (
bool
) – If True, raise a ValueError if the batch dimension ofinput_tensor
is greater than 1.allowed_range (
Optional
[Tuple
[float
,float
]]) – If not None, ensure that all values ofinput_tensor
lie within allowed_range.
- plenoptic.tools.validate.validate_metric(metric, image_shape=None, image_dtype=torch.float32, device='cpu')[source]
Determines whether a metric can be used for MADCompetition synthesis.
In particular, this functions checks the following (with associated exceptions):
Whether
metric
is callable and accepts two 4d tensors as input (TypeError
).Whether
metric
returns a scalar when called with two 4d tensors as input (ValueError
).Whether
metric
returns a value less than 5e-7 when with two identical 4d tensors as input (ValueError
). (This threshold was chosen because 1-SSIM of two identical images is 5e-8 on GPU).
- Parameters:
metric (
Union
[Module
,Callable
[[Tensor
,Tensor
],Tensor
]]) – The metric to validate.image_shape (
Optional
[Tuple
[int
,int
,int
,int
]]) – Some models (e.g., the steerable pyramid) can only accept inputs of a certain shape. If that’s the case formodel
, use this to specify the expected shape. If None, we use an image of shape (1,1,16,16)image_dtype (
dtype
) – What dtype to validate against.device (
Union
[str
,device
]) – What device to place the test images on.
- plenoptic.tools.validate.validate_model(model, image_shape=None, image_dtype=torch.float32, device='cpu')[source]
Determine whether model can be used for sythesis.
In particular, this function checks the following (with their associated errors raised):
If
model
adds a gradient to an input tensor, which implies that some of it is learnable (ValueError
).If
model
returns a tensor when given a tensor, failure implies that not all computations are done using torch (ValueError
).If
model
strips gradient from an input with gradient attached (ValueError
).If
model
casts an input tensor to something else and returns it to a tensor before returning it (ValueError
).If
model
changes the precision of the input tensor (TypeError
).If
model
returns a 3d or 4d output when given a 4d input (ValueError
).If
model
changes the device of the input (RuntimeError
).
Finally, we check if
model
is in training mode and raise a warning if so. Note that this is different from having learnable parameters, see ``pytorch docs <https://pytorch.org/docs/stable/notes/autograd.html#locally-disable-grad-doc>``_- Parameters:
model (
Module
) – The model to validate.image_shape (
Optional
[Tuple
[int
,int
,int
,int
]]) – Some models (e.g., the steerable pyramid) can only accept inputs of a certain shape. If that’s the case formodel
, use this to specify the expected shape. If None, we use an image of shape (1,1,16,16)image_dtype (
dtype
) – What dtype to validate against.device (
Union
[str
,device
]) – What device to place test image on.
See also
remove_grad
Helper function for detaching all parameters (in place).
Module contents
Submodules
plenoptic.version module
Module contents
Display and animate functions
plenoptic
contains a variety of code for visualizing the outputs and the process of synthesis. This notebook details how to make use of that code, which has largely been written with the following goals: 1. If you follow the model API (and that of Synthesis
, if creating a new synthesis method), display code should plot something reasonably useful automatically. 2. The code is flexible enough to allow for customization for more useful visualizations. 3. If the plotting code works, the
animate code should also.
[1]:
import plenoptic as po
import matplotlib.pyplot as plt
# so that relativfe sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
import torch
import numpy as np
%load_ext autoreload
%autoreload 2
%matplotlib inline
/mnt/home/wbroderick/miniconda3/envs/plenoptic/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
[2]:
plt.rcParams['figure.dpi'] = 72
General
We include two wrappers of display code from pyrtools
, adapting them for use with tensors. These imshow
and animshow
, which accept tensors of real- or complex-valued images or videos (respectively) and properly convert them to arrays for display purposes. These are not the most flexible functions (for example, imshow
requires that real-valued tensors be 4d) but, assuming you follow our API, should work relatively painlessly. The main reason for using them (over the image-display
code from matplotlib
) is that we guarantee fidelity to image size: a value in the tensor corresponds to a pixel or an integer number of pixels in the image (if upsampling); if downsampling, we can only down-sample by factors of two. This way, you can be sure that any strange apperance of the image is not due to aliasing in the plotting.
For imshow
, we require that real-valued tensors be 4d: (batch, channel, height, width)
. If you’re showing images, they’re likely to be grayscale (in which case there’s only 1 channel) or RGB(A) (in which case there’s 3 or 4, depending on whether it includes the alpha channel). We plot grayscale images without a problem:
[3]:
img = torch.cat([po.data.einstein(), po.data.curie()], axis=0)
print(img.shape)
fig = po.imshow(img)
torch.Size([2, 1, 256, 256])

We need to tell imshow
that the image(s) are RGB in order for it to be plot correctly.
[4]:
rgb = torch.rand(2, 3, 256, 256)
print(rgb.shape)
fig = po.imshow(rgb, as_rgb=True)
torch.Size([2, 3, 256, 256])

This is because we don’t want to assume that a tensor with 3 or 4 channels is always RGB. To pick a somewhat-contrived example, imagine the following steerable pyramid:
[5]:
pyr = po.simul.SteerablePyramidFreq(img.shape[-2:], downsample=False, height=1, order=2)
[6]:
coeffs, _ = pyr.convert_pyr_to_tensor(pyr(img),split_complex=False)
print(coeffs.shape)
torch.Size([2, 5, 256, 256])
The first and last channels are residuals, so if we only wanted to look at the coefficients, we’d do the following:
[7]:
po.imshow(coeffs[:, 1:-1], batch_idx=0)
po.imshow(coeffs[:, 1:-1], batch_idx=1);


We really don’t want to interpret those values as RGB.
Note that in the above imshow
calls, we had to specify the batch_idx
. This function expects a 4d tensor, but if it has more than one channel and more than one batch (and it’s not RGB), we can’t display everything. The user must therefore specify either batch_idx
or channel_idx
.
[8]:
po.imshow(coeffs[:, 1:-1], channel_idx=0);

animshow
works analogously to imshow
, wrapping around the pyrtools
version but expecting a 5d tensor: (batch, channel, time, height, width)
. It returns a matplotlib.animation.FuncAnimation
object, which can be saved as an mp4 or converted to an html object for display in a Jupyter notebook
[9]:
pyr = po.simul.SteerablePyramidFreq(img.shape[-2:], downsample=False, height='auto', order=3, is_complex=True, tight_frame=False)
coeffs, _ = pyr.convert_pyr_to_tensor(pyr(img), split_complex=False)
print(coeffs.shape)
# because coeffs is 4d, we add a dummy dimension for the channel in order to make animshow happy
po.tools.convert_anim_to_html(po.animshow(coeffs.unsqueeze(1), batch_idx=0,vrange='indep1'))
torch.Size([2, 26, 256, 256])
[9]:
Synthesis-specific
Each synthesis method has a variety of display code to visualize the state and progress of synthesis, as well as to ease understanding of the process and look for ways to improve. For example, in metamer synthesis, it can be useful to determine what component of the model has the largest error.
[10]:
img = po.data.einstein()
model = po.simul.OnOff((7, 7))
rep = model(img)
/mnt/home/wbroderick/miniconda3/envs/plenoptic/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
As long as your model returns a 3d or 4d vector (first two dimensions corresponding to batch
and channel
), then our plotting code should work automatically. If it returns a 3d representation, we plot a stem plot; if it’s 4d, an image.
[11]:
po.tools.display.plot_representation(data=rep, figsize=(11, 5));

This also gets used in the plotting code built into our synthesis methods.
[12]:
po.tools.remove_grad(model)
met = po.synth.Metamer(img, model)
met.synthesize(max_iter=100, store_progress=True,);
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
warnings.warn(
58%|█████▊ | 58/100 [00:01<00:01, 40.93it/s, loss=6.0124e-06, learning_rate=0.01, gradient_norm=7.5704e-04, pixel_change_norm=2.7111e-01]/mnt/home/wbroderick/plenoptic/src/plenoptic/synthesize/metamer.py:195: UserWarning: Loss has converged, stopping synthesis
warnings.warn("Loss has converged, stopping synthesis")
61%|██████ | 61/100 [00:01<00:00, 39.43it/s, loss=6.0124e-06, learning_rate=0.01, gradient_norm=7.5704e-04, pixel_change_norm=2.7111e-01]
After we’ve run synthesis for a while, we want to investigate how close we are. We can examine the numbers printed out above, but it’s probably useful to plot something. We provide the plot_synthesis_status()
function for doing this. By default, it includes the synthesized image, the loss, and the representation error. That lost plot is the same as the one above, except it plots data = base_representation - synthesized_representation
.
[13]:
# we have two image plots for representation error, so that bit should be 2x wider
fig = po.synth.metamer.plot_synthesis_status(met, width_ratios={'plot_representation_error': 2.1})
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/display.py:950: UserWarning: ax is not None, so we're ignoring figsize...
warnings.warn("ax is not None, so we're ignoring figsize...")

You can also create this plot at different iterations, in order to try and better understand what’s happening
[14]:
fig = po.synth.metamer.plot_synthesis_status(met, iteration=10, width_ratios={'plot_representation_error': 2.1})

The appearance of this figure is very customizable. There are several additional plots that can be included, and all plots are optional. The additional plot below is two histograms comparing the pixel values of the synthesized and base signal.
[15]:
fig = po.synth.metamer.plot_synthesis_status(met, included_plots=['display_metamer', 'plot_loss',
'plot_representation_error', 'plot_pixel_values'],
width_ratios={'plot_representation_error': 2.1})

In addition to being able to customize which plots to include, you can also pre-create the figure (with axes, if you’d like) and pass it in. By default, we try and create an appropriate-looking figure, with appropriately-sized plots, but this allows for more flexibility:
[16]:
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
fig = po.synth.metamer.plot_synthesis_status(met, included_plots=['display_metamer', 'plot_loss',
'plot_pixel_values'],
fig=fig)

For even more flexibility, you can specify which plot should go in which axes, by creating an axes_idx
dictionary. Keys for each plot can be created, as well as a subset (in which case each plot gets added to the next available axes, like above when axes_idx
is unset; see docstring for key names):
[17]:
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
axes_idx = {'display_metamer': 3, 'plot_pixel_values': 0}
fig = po.synth.metamer.plot_synthesis_status(met, included_plots=['display_metamer', 'plot_loss',
'plot_pixel_values'],
fig=fig, axes_idx=axes_idx)

This allows enables you to create more complicated figures, with axes containing other plots, arrows and other annotations, etc.
[18]:
fig, axes = plt.subplots(2, 3, figsize=(17, 12))
# to tell plot_synthesis_status to ignore plots, add them to the misc keys
axes_idx = {'display_metamer': 5, 'misc': [0, 4]}
axes[0, 0].text(.5, .5, 'SUPER COOL TEXT', color='r')
axes[1, 0].arrow(0, 0, .25, .25, )
axes[0, 0].plot(np.linspace(0, 1), np.random.rand(50))
fig = po.synth.metamer.plot_synthesis_status(met, included_plots=['display_metamer', 'plot_loss',
'plot_pixel_values'],
fig=fig, axes_idx=axes_idx)

We similarly have an animate
function, which animates the above plots over time, and everything that I said above also holds for them. Note that animate
will take a fair amount of time to run and requires ffmpeg on your system for most file formats (see matplotlib docs for more details).
[19]:
fig, axes = plt.subplots(2, 3, figsize=(17, 12))
# to tell plot_synthesis_status to ignore plots, add them to the misc keys
axes_idx = {'display_metamer': 5, 'misc': [0, 4]}
axes[0, 0].text(.5, .5, 'SUPER COOL TEXT', color='r')
axes[1, 0].arrow(0, 0, .25, .25, )
axes[0, 0].plot(np.linspace(0, 1), np.random.rand(50))
anim = po.synth.metamer.animate(met, included_plots=['display_metamer', 'plot_loss',
'plot_pixel_values'],
fig=fig, axes_idx=axes_idx,)
This anim
object is not viewable by itself: it either needs to be converted to html for display in the notebook, or saved as an .mp4
file (by calling anim.save(filename)
)
[20]:
po.tools.convert_anim_to_html(anim)
[20]:
More complicated model representation plots
While this provides a starting point, it’s not always super useful. In the example above, the LinearNonlinear
model returns the output of several convolutional kernels across the image, and so plotting as a series of images is pretty decent. The representation of the PortillaSimoncelli
model below, however, has several distinct components at multiple spatial scales and orientations. That structure is lost in a single stem plot:
[21]:
img = po.data.reptile_skin()
ps = po.simul.PortillaSimoncelli(img.shape[-2:])
rep = ps(img)
po.tools.display.plot_representation(data=rep);

Trying to guess this advanced structure would be impossible for our generic plotting functions. However, if your model has a plot_representation()
method, we can make use of it:
[22]:
ps.plot_representation(data=rep, ylim=False);

Our display.plot_representation
function can make use of this method if you pass it the model; note how the plot below is identical to the one above. This might not seem very useful, but we make use of this in the different plotting methods used by our synthesis classes explained above.
[23]:
po.tools.display.plot_representation(ps, rep, figsize=(15, 15));

[24]:
met = po.synth.MetamerCTF(img, ps, loss_function=po.tools.optim.l2_norm, coarse_to_fine='together')
met.synthesize(max_iter=400, store_progress=10,
change_scale_criterion=None, ctf_iters_to_check=10);
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:178: UserWarning: model is in training mode, you probably want to call eval() to switch to evaluation mode
warnings.warn(
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/validate.py:211: UserWarning: Validating whether model can work with coarse-to-fine synthesis -- this can take a while!
warnings.warn("Validating whether model can work with coarse-to-fine synthesis -- this can take a while!")
100%|██████████| 400/400 [00:38<00:00, 10.35it/s, loss=2.5739e-01, learning_rate=0.01, gradient_norm=1.2590e+00, pixel_change_norm=2.4271e-01, current_scale=all, current_scale_loss=2.5739e-01]
[25]:
fig, _ = po.synth.metamer.plot_synthesis_status(met)
/mnt/home/wbroderick/plenoptic/src/plenoptic/tools/display.py:950: UserWarning: ax is not None, so we're ignoring figsize...
warnings.warn("ax is not None, so we're ignoring figsize...")

And again, we can animate this over time:
[26]:
po.tools.convert_anim_to_html(po.synth.metamer.animate(met))
[26]:
Advanced
Put info about update_plot
here?
Extending existing synthesis objects
Once you are familiar with the existing synthesis objects included in plenoptic
, you may wish to change some aspect of their function. For example, you may wish to change how the po.synth.MADCompetition
initializes the MAD image or alter the objective function of po.synth.Metamer
. While you could certainly start from scratch or copy the source code of the object and alter them directly, an easier way to do so is to create a new sub-class: an object that inherits the synthesis object
you wish to modify and over-writes some of its existing methods.
For example, you could create a version of po.synth.MADCompetition
that starts with a different natural image (rather than with image
argument plus normally-distributed noise) by creating the following object:
[1]:
import plenoptic as po
from torch import Tensor
import torch
import matplotlib.pyplot as plt
import warnings
from typing import Union, Callable, Tuple, Optional
from typing_extensions import Literal
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
%load_ext autoreload
%autoreload 2
[2]:
class MADCompetitionVariant(po.synth.MADCompetition):
"""Initialize MADCompetition with an image instead!"""
def __init__(self, image: Tensor,
optimized_metric: Union[torch.nn.Module, Callable[[Tensor, Tensor], Tensor]],
reference_metric: Union[torch.nn.Module, Callable[[Tensor, Tensor], Tensor]],
minmax: Literal['min', 'max'],
initial_image: Tensor = None,
metric_tradeoff_lambda: Optional[float] = None,
range_penalty_lambda: float = .1,
allowed_range: Tuple[float, float] = (0, 1)):
if initial_image is None:
initial_image = torch.rand_like(image)
super().__init__(image, optimized_metric, reference_metric,
minmax, initial_image, metric_tradeoff_lambda,
range_penalty_lambda, allowed_range)
def _initialize(self, initial_image: Tensor):
mad_image = initial_image.clamp(*self.allowed_range)
self._initial_image = mad_image.clone()
mad_image.requires_grad_()
self._mad_image = mad_image
self._reference_metric_target = self.reference_metric(self.image,
self.mad_image).item()
self._reference_metric_loss.append(self._reference_metric_target)
self._optimized_metric_loss.append(self.optimized_metric(self.image,
self.mad_image).item())
We can then interact with this new object in the same way as the original MADCompetition
object, the only difference being how it’s initialized:
[3]:
image = po.data.einstein()
curie = po.data.curie()
new_mad = MADCompetitionVariant(image, po.metric.mse, lambda *args: 1-po.metric.ssim(*args),
'min', curie)
old_mad = po.synth.MADCompetition(image, po.metric.mse, lambda *args: 1-po.metric.ssim(*args),
'min', .1)
/home/billbrod/Documents/plenoptic/plenoptic/tools/data.py:126: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
images = torch.tensor(images, dtype=torch.float32)
/home/billbrod/miniconda3/envs/plenoptic/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/home/billbrod/Documents/plenoptic/plenoptic/synthesize/mad_competition.py:130: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
loss_ratio = torch.tensor(self.optimized_metric_loss[-1] / self.reference_metric_loss[-1],
/home/billbrod/Documents/plenoptic/plenoptic/synthesize/mad_competition.py:134: UserWarning: Since metric_tradeoff_lamda was None, automatically set to 0.10000000149011612 to roughly balance metrics.
warnings.warn("Since metric_tradeoff_lamda was None, automatically set"
/home/billbrod/Documents/plenoptic/plenoptic/synthesize/mad_competition.py:134: UserWarning: Since metric_tradeoff_lamda was None, automatically set to 0.009999999776482582 to roughly balance metrics.
warnings.warn("Since metric_tradeoff_lamda was None, automatically set"
We can see below that the two versions have the same image
whose representation they’re trying to match, but very different initial images.
[4]:
po.imshow([old_mad.image, old_mad.initial_image, new_mad.image, new_mad.initial_image],
col_wrap=2);

We call synthesize in the same way and can even make use of the original plot_synthesis_status
function to see what synthesis looks like
[5]:
with warnings.catch_warnings():
# we suppress the warning telling us that our image falls outside of the (0, 1) range,
# which will happen briefly during synthesis.
warnings.simplefilter('ignore')
old_mad.synthesize(store_progress=True)
po.synth.mad_competition.plot_synthesis_status(old_mad, included_plots=['display_mad_image', 'plot_loss']);

[6]:
with warnings.catch_warnings():
# we suppress the warning telling us that our image falls outside of the (0, 1) range,
# which will happen briefly during synthesis.
warnings.simplefilter('ignore')
new_mad.synthesize(store_progress=True)
po.synth.mad_competition.plot_synthesis_status(new_mad, included_plots=['display_mad_image', 'plot_loss']);

For version initialized with the image of Marie Curie, let’s also examine the metamer shortly after synthesis started, since the final version doesn’t look that different:
[7]:
po.synth.mad_competition.display_mad_image(new_mad, iteration=10);

See the documentation for more description of how the synthesis objects are structured to get ideas for how else to modify them, but some good methods to over-write include (note that not every object uses each of these methods): _initialize
, _check_convergence
, and objective_function
(for more serious changes to initialization, probably better to start with _initialize
). For a more serious change, you could also overwrite synthesis
and
_optimizer_step
(and possibly _closure
) to really change how synthesis works. See po.synth.MetamerCTF
for an example of how to do this.
These methods also work with auditory models, such as in Feather et al., 2019 though we haven’t yet implemented examples. If you’re interested, please post in Discussions!
Portilla, J., & Simoncelli, E. P. (2000). A parametric texture model based on joint statistics of complex wavelet coefficients. International journal of computer vision, 40(1), 49–70. https://www.cns.nyu.edu/~lcv/texture/. https://www.cns.nyu.edu/pub/eero/portilla99-reprint.pdf
Freeman, J., & Simoncelli, E. P. (2011). Metamers of the ventral stream. Nature Neuroscience, 14(9), 1195–1201. http://www.cns.nyu.edu/pub/eero/freeman10-reprint.pdf
Deza, A., Jonnalagadda, A., & Eckstein, M. P. (2019). Towards metamerism via foveated style transfer. In , International Conference on Learning Representations.
Feather, J., Durango, A., Gonzalez, R., & McDermott, J. (2019). Metamers of neural networks reveal divergence from human perceptual systems. In NeurIPS (pp. 10078–10089).
Wallis, T. S., Funke, C. M., Ecker, A. S., Gatys, L. A., Wichmann, F. A., & Bethge, M. (2019). Image content is more important than bouma’s law for scene metamers. eLife. http://dx.doi.org/10.7554/elife.42512
Berardino, A., Laparra, V., J Ball'e, & Simoncelli, E. P. (2017). Eigen-distortions of hierarchical representations. In I. Guyon, U. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett, Adv. Neural Information Processing Systems (NIPS*17) (pp. 1–10). : Curran Associates, Inc. https://www.cns.nyu.edu/~lcv/eigendistortions/ http://www.cns.nyu.edu/pub/lcv/berardino17c-final.pdf
Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual discriminability. Journal of Vision, 8(12), 1–13. https://ece.uwaterloo.ca/~z70wang/research/mad/ http://www.cns.nyu.edu/pub/lcv/wang08-preprint.pdf
H'enaff, O.~J., & Simoncelli, E.~P. (2016). Geodesics of learned representations. ICLR. http://www.cns.nyu.edu/pub/lcv/henaff16b-reprint.pdf
O Hénaff, Y Bai, J Charlton, I Nauhaus, E P Simoncelli and R L T Goris. Primary visual cortex straightens natural video trajectories Nature Communications, vol.12(5982), Oct 2021. https://www.cns.nyu.edu/pub/lcv/henaff20-reprint.pdf
Simoncelli, E. P., Freeman, W. T., Adelson, E. H., & Heeger, D. J. (1992). Shiftable Multi-Scale Transforms. IEEE Trans. Information Theory, 38(2), 587–607. http://dx.doi.org/10.1109/18.119725
Simoncelli, E. P., & Freeman, W. T. (1995). The steerable pyramid: A flexible architecture for multi-scale derivative computation. In , Proc 2nd IEEE Int’l Conf on Image Proc (ICIP) (pp. 444–447). Washington, DC: IEEE Sig Proc Society. http://www.cns.nyu.edu/pub/eero/simoncelli95b.pdf
Wang, Z., Bovik, A., Sheikh, H., & Simoncelli, E. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://www.cns.nyu.edu/~lcv/ssim/. http://www.cns.nyu.edu/pub/lcv/wang03-reprint.pdf
Z Wang, E P Simoncelli and A C Bovik. Multiscale structural similarity for image quality assessment Proc 37th Asilomar Conf on Signals, Systems and Computers, vol.2 pp. 1398–1402, Nov 2003. http://www.cns.nyu.edu/pub/eero/wang03b.pdf
Laparra, V., Berardino, A., Johannes Ball'e, & Simoncelli, E. P. (2017). Perceptually Optimized Image Rendering. Journal of the Optical Society of America A, 34(9), 1511. http://www.cns.nyu.edu/pub/lcv/laparra17a.pdf
Laparra, V., Ballé, J., Berardino, A. and Simoncelli, E.P., 2016. Perceptual image quality assessment using a normalized Laplacian pyramid. Electronic Imaging, 2016(16), pp.1-6. http://www.cns.nyu.edu/pub/lcv/laparra16a-reprint.pdf
Ziemba, C.M., and Simoncelli, E.P. (2021). Opposing effects of selectivity and invariance in peripheral vision. Nature Communications, vol.12(4597). https://dx.doi.org/10.1038/s41467-021-24880-5
This package is supported by the ‘Center for Computational Neuroscience <https://www.simonsfoundation.org/flatiron/center-for-computational-neuroscience/>’_, in the Flatiron Institute of the Simons Foundation.
