DeepVoxels: Learning Persistent 3D Feature Embeddings

CVPR 2019 (Oral)

Teaching 3D to 2D generative models.

Vincent Sitzmann, Justus Thies, Felix Heide, Matthias Niessner, Gordon Wetzstein, Michael Zollhöfer

Code Dataset Paper Supplemental Material

Follow-up work: Scene Representation Networks

Check out Scene Representation Networks, where we replace the voxel grid with a continuous function that naturally generalizes across scenes and smoothly parameterizes scene surfaces!

2D Generative Models don't Understand 3D

Deep Generative Models today allow us to perform highly-realistic image synthesis. While each generated image is of high quality, a major challenge is to generate a series of coherent views of the same scene. This requires the network to have a latent space representation that fundamentally understands the 3D layout of the scene; e.g., how would the same chair look from a different viewpoint?

Unfortunately, this is challenging for existing models that are based on a series of 2D convolution kernels. Instead of parameterizing 3D transformations, they will explain training data in a higher-dimensional feature space, leading to poor generalization to novel views at test time - such as the output of Pix2Pix trained on images of the cube above.

DeepVoxels: A 3D-structured Neural Scene Representation

With DeepVoxels, we introduce a 3D-structured neural scene representation. DeepVoxels encodes the view-dependent appearance of a 3D scene without having to explicitly model its geometry. DeepVoxels is based on a Cartesian 3D grid of persistent features that learn to make use of the underlying 3D scene structure. It combines insights from 3D computer vision with recent advances in learning image-to-image mappings. DeepVoxels is supervised, without requiring a 3D reconstruction of the scene, using a 2D re-rendering loss and enforces perspective and multi-view geometry in a principled manner.

Results on Real Captures with Nearest Neighbor Comparison

Submission Video

CVPR 2019 Paper


@inproceedings{sitzmann2019deepvoxels, author = {Sitzmann, Vincent and Thies, Justus and Heide, Felix and Nie{\ss}ner, Matthias and Wetzstein, Gordon and Zollh{\"o}fer, Michael}, title = {DeepVoxels: Learning Persistent 3D Feature Embeddings}, booktitle = {Proc. Computer Vision and Pattern Recognition (CVPR), IEEE}, year={2019} }