Scattering Transforms

Relevant Papers

  • Wavelet Scattering Regression of Quantum Chemical Energies. With Stéphane Mallat and Nicolas Poilvert. Multiscale Modeling and Simulation, volume 15, issue 2, 827-863, 2017. pdfarXiv, MMSSoftware.
  • Quantum Energy Regression using Scattering Transforms. With Nicolas Poilvert and Stéphane Mallat. 2015. pdfarXiv.


Deep learning architectures have recently reemerged with the advent of new computational resources and algorithmic improvements. They are yielding remarkable state of the art results in numerous learning tasks, primarily in computer vision for the analysis of images, speech recognition, and natural language processing [1-5]. Recently, such algorithms have begun branching into other areas, such as music [6] and even physics [7]. However, the complexity of such algorithms means that they have remained essentially a black box, yielding proportionally little insight given their performance achievements, which limits their utility in fields outside of those traditionally tackled by the machine learning community and obstructs new scientific directions. My research aims to open this blackbox by utilizing tools from harmonic analysis to construct multiscale deep learning architectures amenable to mathematical analysis.

From structured data, such as time series, textures, images, EEG, ECG, MRI, and 3D scans, one can extract relevant features via filter-based analysis. Convolutional networks apply a sequence of filters and nonlinear averaging operations, which are learned for a specific task from training data. A particular task may necessitate that the networks encode certain fundamental invariant and stability properties, but they are not guaranteed by the algorithm and any such properties are an implicit byproduct of the algorithm rather than an explicit design choice.

A scattering transform [8] constructs similar architectures based on convolutional filters. Rather than beginning directly with task, however, the network is designed to encode geometric invariants and stability properties that are known to be present in the data. For example, translation and rotation invariance (either globally or locally) are often desired in image processing tasks [9-12], although there is no theoretical barrier to incorporating other invariants into the transform. Additionally, in many machine learning and data analysis tasks, small deformations of the data do not drastically affect the outcome; scattering networks are guaranteed to be stable such deformations, unlike their convolutional net counterparts.

More specifically, let x(t) be a structured data point, in which t\in\mathbb{R}^d represents for example one-dimensional time, two-dimensional space, or three-dimensional volume, (hereafter referred to as “space”). A wavelet \psi is a complex waveform that is well localized in both space and frequency. The frequency support of \widehat{\psi} is essentially contained in a frequency ball centered at a central frequency \omega_0.

The wavelet \psi is dilated at scales 2^j and rotated by r_{\theta}\in\mathrm{SO}(d),

\psi_{j, \theta} (t) = 2^{-dj} \psi (2^{-j} r_{\theta}^{-1} t).

Thus \widehat{\psi}_{j,\theta}(\omega)=\widehat{\psi}(2^{j}r_{\theta}\omega), and so \widehat{\psi}_{j, \theta} is essentially supported in a frequency ball centered at 2^{-j}r_{\theta}^{-1}\omega_0, and dilated by a factor 2^{-j} relative to the support of \widehat{\psi}. See the figure below for an illustration of how the frequency support \widehat{\psi}_{j, \theta} varies with j and \theta in two dimensions.

Frequency support of a 2D wavelet \psi_{j,\theta}. Colors indicate different scales 2^j, while rotations r_{\theta} fill out each annulus. The Fourier transform of the low pass filter, \widehat{\phi}_J, is supported in the green circle centered at the origin.

Wavelet coefficients of x are computed via convolution x \ast \psi_{j,\theta} for different scales and rotations. Wavelet coefficients are computed up to a maximum scale 2^J. Frequencies below 2^{-J} are captured by a low pass filter \phi_J(t) = 2^{-dJ}\phi(2^{-J}t), where \phi(t) \geq 0 is a positive rotationally symmetric function, such as a Gaussian. Its Fourier transform \widehat{\phi} is essentially supported in the ball |\omega| \leq |\omega_0|. The resulting wavelet transform of x is defined as:

Wx=\{x\ast\phi_J, \enspace x\ast\psi_{j,\theta} : j<J, \, r_{\theta}\in\mathrm{SO}(d)\}

It contains similar frequency information as the Fourier transform of x, but the multiscale/multiresolution wavelet approach yields numerous additional desirable properties not available to standard Fourier analysis.

The wavelet transform is linear. A nonlinear transform is obtained by taking the complex modulus of the wavelet coefficients:

|W|x=\{x\ast\phi_J, \enspace |x\ast\psi_{j,\theta}| : j<J, \, r_{\theta}\in\mathrm{SO}(d)\}

The complex modulus computes the complex envelope of the wavelet coefficients x \ast \psi_{j, \theta}. The nonlinear modulus therefore smooths the wavelet coefficients, pushing their frequency content into the lower frequencies.

Invariants are obtained by averaging over the relevant structures in the wavelet transform. The low pass transform,

S_J[\emptyset]x(t) = x \ast \phi_J(t),

is an averaging operator which is translation invariant up to the scale 2^J. Global translation invariance is achieved by letting J \rightarrow \infty, which in effect computes the global average of x. The wavelet modulus coefficients |x\ast\psi_{j,\theta}| are covariant to translations and rotations, but not invariant. Translation invariant features are obtained by applying the low pass filter on top of the wavelet modulus transform:


The collection:

S_Jx=\{x\ast\phi_J, \enspace |x\ast\psi_{j,\theta}|\ast\phi_J \}

constitutes a set translation invariant features of the signal x. Translation and rotation invariant features are obtained by replacing \phi(t) with a low pass filter \phi(t,\theta) over both translations t and rotations \theta. In fact, invariance to any finite group or Lie group action can be obtained through appropriately defined wavelet transforms, see [8] for more details. In what follows, we focus on translation invariance in order to simplify the presentation.

The low pass transform x\ast\phi_J… [to be continued]


[1]  Yoshua Bengio. Learning deep architectures for AI. Found. Trends Mach. Learn., 2(1):1–127, January 2009.

[2]  Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell., 35(8):1798–1828, August 2013.

[3]  Li Deng and Dong Yu. Deep learning: Methods and applications. Found. Trends Signal Process., 7(3-4):197–387, June 2014.

[4]  Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85 – 117, 2015.

[5]  Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 05 2015.

[6]  E. Humphrey, J.P. Bello, and Y. LeCun. Feature learning and deep architectures: New direc- tions for music informatics. Journal of Intelligent Information Systems, 41(3):461–481, 2013.

[7]  Pankaj Mehta and David J. Schwab. An exact mapping between the variational renormalization group and deep learning. arXiv:1410.3831, 2014.

[8] Stéphane Mallat. Group invariant scattering. Communications on Pure and Applied Mathematics, 65(10):1331–1398, October 2012.

[9] Laurent Sifre and Stéphane Mallat. Combined scattering for rotation invariant texture analy- sis. In Proceedings of the ESANN 2012 conference, 2012.

[10]  Laurent Sifre and Stéphane Mallat. Rotation, scaling and deformation invariant scattering for texture discrimination. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013.

[11]  Laurent Sifre and Stéphane Mallat. Rigid-motion scattering for texture classification. arXiv:1403.1687, 2014.

[12]  Edouard Oyallon and Stéphane Mallat. Deep roto-translation scattering for object classification. In Proceedings in IEEE CVPR 2015 conference, 2015. arXiv:1412.8659