Implicit Neural Representations
with Periodic Activation Functions

NeurIPS 2020 (Oral)


Vincent Sitzmann*, Julien N. P. Martel*, Alexander Bergman,
David B. Lindell, Gordon Wetzstein

Paper Colab Notebook Tensorflow Playground Code Data

Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal’s spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or SIREN, are ideally suited for representing complex natural signals and their derivatives. We analyze SIREN activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how SIREN s can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine SIREN with hypernetworks to learn priors over the space of SIREN functions.

Baselines


The following results compare SIREN to a variety of network architectures. TanH, ReLU, Softplus etc. means an MLP of equal size with the respective nonlinearity. We also compare to the recently proposed positional encoding, combined with a ReLU nonlinearity, noted as ReLU P.E. SIREN outperforms all baselines by a significant margin, converges significantly faster, and is the only architecture that accurately represents the gradients of the signal, enabling its use to solve boundary value problems.

Representing images


A Siren that maps 2D pixel coordinates to a color may be used to parameterize images. Here, we supervise Siren directly with ground-truth pixel values. Siren not only fits the image with a 10 dB higher PSNR and in significantly fewer iterations than all baseline architectures, but is also the only MLP that accurately represents the first- and second order derivatives of the image.

Representing Audio


A Siren with a single, time-coordinate input and scalar output may parameterize audio signals. Siren is the only network architecture that succeeds in reproducing the audio signal, both for music and human voice.

Ground truth
ReLU MLP
ReLU w/ Pos. Enc.
Siren

Representing Video


A Siren with pixel coordinates together with a time coordinate can be used to parameterize a video. Here, Siren is directly supervised with the ground-truth pixel values, and parameterizes video significantly better than a ReLU MLP.

Solving the Poisson Equation


By supervising only the derivatives of Siren, we can solve Poisson's equation. Siren is again the only architecture that fits image, gradient, and laplacian domains accurately and swiftly.

Representing shapes by solving the Eikonal equation
Interactive 3D SDF Viewer - Use Your Mouse to Navigate the Scenes


We can recover an SDF from a pointcloud and surface normals by solving the Eikonal equation, a first-order boundary value problem. SIREN can recover a room-scale scene given only its pointcloud and surface normals, accurately reproducing fine detail, in less than an hour of training. In contrast to recent work on combining voxel grids with neural implicit representations, this stores the full scene in the weights of a single, 5-layer neural network, with no 2D or 3D convolutions, and orders of magnitude fewer parameters. Zoom in to compare fine detail! Note that these SDFs are not supervised with ground-truth SDF / occupancy values, but rather, are the result of solving the above Eikonal boundary value problem. This is a significantly harder task, which requires supervision in the gradient domain (see paper). As a result, architectures whose gradients are not well-behaved perform worse than SIREN.

Room - Siren

Room - ReLU

Statue - Siren

Statue - ReLU Pos. Enc.

Statue - ReLU

Solving the Helmholtz equation


Here, we use Siren to solve the inhomogeneous Helmholtz equation. ReLU- and Tanh-based architectures fail entirely to converge to a solution.

Solving the wave equation


In the time domain, Siren succeeds to solve the wave equation, while a Tanh-based architecture fails to discover the correct solution.

Related Projects


Check out our related projects on the topic of implicit neural representations!

We identify a key relationship between generalization across implicit neural representations and meta- learning, and propose to leverage gradient-based meta-learning for learning priors over deep signed distance functions. This allows us to reconstruct SDFs an order of magnitude faster than the auto-decoder framework, with no loss in performance!
A continuous, 3D-structure-aware neural scene representation that encodes both geometry and appearance, supervised only in 2D via a neural renderer, and generalizes for 3D reconstruction from a single posed 2D image.
We demonstrate that the features learned by neural implicit scene representations are useful for downstream tasks, such as semantic segmentation, and propose a model that can learn to perform continuous 3D semantic segmentation on a class of objects (such as chairs) given only a single, 2D (!) semantic label map!

Paper


Bibtex


@inproceedings{sitzmann2019siren, author = {Sitzmann, Vincent and Martel, Julien N.P. and Bergman, Alexander W. and Lindell, David B. and Wetzstein, Gordon}, title = {Implicit Neural Representations with Periodic Activation Functions}, booktitle = {Proc. NeurIPS}, year={2020} }