[visionlist] PhD position at Inria: Neural novel view synthesis of dynamic people in monocular videos

Adnane Boukhayma adnane.boukhayma at gmail.com
Fri Apr 3 12:01:28 -04 2020

Neural novel view synthesis of dynamic people in monocular videos

Capturing a scene or a 3D object with only a few images and synthesizing
novel photo-realistic views of it is a long standing problem, traditionally
referred to as image based rendering in the computer vision and graphics
communities. It has recently regained attention due to its wide
applications in free viewpoint and virtual reality displays, as well as
image editing and manipulation. The problem remains challenging as it
requires some form of understanding of the images' viewpoints as well as
the 3D scene, fusing the visible regions and inpainting the missing ones.
Image based rendering methods typically approach this by computing either
(1) an explicit 3D representation of the scene, (2) image correspondences
and warping fields*,* or (3) a light field representation. Most of these
approaches are slow, costly or prone to failures. Recently, deep learning
methods (e.g. [1,2,3,4,5,6,7,8]) managed to remedy some of the artifacts of
the previous approaches, but they still lack in generalization and quality.
We are further interested in the case where the input is a *video, *with a
specific focus on humans (e.g. a person in changing poses or facial
expressions). Using motion information for this problem has not been
explored sufficiently, and while it could enhance the performance of view
synthesis methods, it also presents additional challenges such as the need
for non-linear alignments to combine the appearance information.

The goal of this PhD is to study novel view synthesis methods from
monocular videos of dynamic objects using deep learning, with a focus on
the human shape. Generating novel views of a moving person captured in a
video increases the complexity of the standard formulation of the image
based rendering problem, as it requires building a model that can
understand, factor out and leverage the resulting appearance variations
from the human motion. Another major challenge is to design a model that
can both: (1) learn from training data of random people and scenes, and (2)
combine that knowledge efficiently with the information in the few frames
of the person and scene of interest given at test time. This could be
initially approached by studying few-shot learning of generative models, as
was done recently e.g. for head image animation [9] and text-to-speech
generation [10,11].

The PhD student will be tasked with:
- Developing novel deep generative approaches for the problem of novel view
synthesis of a dynamic person in a monocular video.
- Exploring possible solutions based on few-shot learning, transfer
learning, and meta learning.
- Comparing strategies based on direct image translation to those that
render from explicit or implicit 3D representations.
- Achieving generalization to videos in the wild for deep networks trained
primarily with real data captured in controlled environments and synthetic

The PhD student will be co-supervised by Dr. Adnane Boukhayma and Prof.
Franck Multon. The PhD will be conducted at Inria Rennes, in the MimeTIC
research team. Candidates should preferably have a MSc degree in computer
science, applied mathematics, computer vision, computer graphics or machine
learning. Proficiency in coding in Python and C++ is a plus. We are looking
for excellent candidates, preferably with a good background in mathematics
or computer science, passionate for research and innovation, who can work
independently and who are also keen to collaborate with other students and

Keywords: Deep learning, Neural Rendering, Novel view synthesis, Human 3D
representation, Few shot learning

[1] Neural Volumes: Learning Dynamic Renderable Volumes from Images, ACM
TOG 2019
[2] Local Light Field Fusion: Practical View Synthesis with Prescriptive
Sampling Guidelines, SIGGRAPH 2019
[3] Liquid Warping GAN: A Unified Framework for Human Motion Imitation,
Appearance Transfer and Novel View Synthesis, ICCV 2019
[4] HoloGAN: Unsupervised learning of 3D representations from natural
images, ICCV 2019
[5] Transformable Bottleneck Networks, ICCV 2019
[6] Deep Blending for Free-Viewpoint Image-Based-Rendering, SIGGRAPH Asia
[7] View Independent Generative Adversarial Network for Novel View
Synthesis, ICCV 2019
[8] DeepView: View Synthesis with Learned Gradient Descent, CVPR 2019
[9] Few-Shot Adversarial Learning of Realistic Neural Talking Head Models,
ICCV 2019
[10] Neural Voice Cloning with a Few Samples, NeurIPS 2018
[11] Transfer Learning from Speaker Verification to Multispeaker
Text-To-Speech Synthesis, NeurIPS 2018
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://visionscience.com/pipermail/visionlist_visionscience.com/attachments/20200403/776e3759/attachment-0001.html>

More information about the visionlist mailing list