<div dir="ltr"><div><div><div style="font-family:arial,helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><div><div><div><div style="font-family:arial,helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><div><div><div><div><div>Neural novel view synthesis of dynamic people in monocular videos</div></div></div><div><span class="gmail-Object" id="gmail-OBJ_PREFIX_DWT51_com_zimbra_url"><span class="gmail-Object" id="gmail-OBJ_PREFIX_DWT56_com_zimbra_url"><span class="gmail-Object" id="gmail-OBJ_PREFIX_DWT111_com_zimbra_url"><span class="gmail-Object" id="gmail-OBJ_PREFIX_DWT116_com_zimbra_url"><a target="_blank" href="https://jobs.inria.fr/public/classic/fr/offres/2020-02420">https://jobs.inria.fr/public/classic/fr/offres/2020-02420</a></span></span></span></span><br></div><div><br><div>Capturing

 a scene or a 3D object with only a few images and synthesizing novel 

photo-realistic views of it is a long standing problem, traditionally 

referred to as image based rendering in the computer vision and graphics

 communities. It has recently regained attention due to its wide 

applications in free viewpoint and virtual reality displays, as well as 

image editing and manipulation. The problem remains challenging as it 

requires some form of understanding of the images' viewpoints as well as

 the 3D scene, fusing the visible regions and inpainting the missing 

ones. Image based rendering methods typically approach this by computing

 either (1) an explicit<b> </b>3D representation of the scene, (2) image correspondences and warping fields<b>,</b> 

or (3) a light field representation. Most of these approaches are slow, 

costly or prone to failures. Recently, deep learning methods (e.g. 

[1,2,3,4,5,6,7,8]) managed to remedy some of the artifacts of the 

previous approaches, but they still lack in generalization<b> </b>and quality. We are further interested in the case where the input is a <i>video, </i>with

 a specific focus on humans (e.g. a person in changing poses or facial 

expressions). Using motion information for this problem has not been 

explored sufficiently, and while it could enhance the performance of 

view synthesis methods, it also presents additional challenges such as 

the need for non-linear alignments to combine the appearance 

information.</div><br></div><div><div dir="ltr"><div>The goal of this PhD is to study novel view synthesis methods<b> </b>from monocular videos of dynamic objects using deep learning, with a focus on the human shape.<b> </b>Generating

 novel views of a moving person captured in a video increases the 

complexity of the standard formulation of the image based rendering 

problem, as it requires building a model that can understand, factor out

 and leverage the resulting appearance variations from the human motion.

 Another major challenge is to design a model that can both: (1) learn 

from training data of random people and scenes, and (2) combine that 

knowledge efficiently with the information in the few frames of the 

person and scene of interest given at test time. This could be initially

 approached by studying<b> </b>few-shot learning of generative models, as was done recently e.g.<b> </b>for head image animation [9] and text-to-speech generation [10,11]. <br></div><br><div>The PhD student will be tasked with: <br>- Developing novel deep generative approaches for the problem of novel view synthesis of a dynamic person in a monocular video.<br>- Exploring possible solutions based on few-shot learning, transfer learning, and meta learning.<br>- Comparing strategies based on direct image translation to those that render from explicit or implicit 3D representations.<br>-

 Achieving generalization to videos in the wild for deep networks 

trained primarily with real data captured in controlled environments and

 synthetic data.<br></div><br><div>The PhD student will be co-supervised by Dr. Adnane Boukhayma and Prof. Franck Multon. The PhD will be conducted at Inria Rennes, in the MimeTIC research team<span class="gmail-Object" id="gmail-OBJ_PREFIX_DWT235_com_zimbra_url"><span class="gmail-Object" id="gmail-OBJ_PREFIX_DWT297_com_zimbra_url"><span class="gmail-Object" id="gmail-OBJ_PREFIX_DWT55_com_zimbra_url"><span class="gmail-Object" id="gmail-OBJ_PREFIX_DWT60_com_zimbra_url"><span class="gmail-Object" id="gmail-OBJ_PREFIX_DWT115_com_zimbra_url"><span class="gmail-Object" id="gmail-OBJ_PREFIX_DWT120_com_zimbra_url"></span></span></span></span></span></span>. Candidates should preferably have a <span class="gmail-st">MSc</span> 

degree in computer science, applied mathematics, computer vision, 

computer graphics or machine learning. Proficiency in coding in Python 

and C++ is a plus. We are looking for excellent candidates, preferably 

with a good background in mathematics or computer science, passionate 

for research and innovation, who can work independently and who are also

 keen to collaborate with other students and researchers.   </div><br><div>Keywords: Deep learning, Neural Rendering, Novel view synthesis, Human 3D representation, Few shot learning</div><div><br><div>[1] Neural Volumes: Learning Dynamic Renderable Volumes from Images, ACM TOG 2019</div><div>[2] Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines, SIGGRAPH 2019</div>[3] Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis, ICCV 2019<br>[4] HoloGAN: Unsupervised learning of 3D representations from natural images, ICCV 2019<br>[5] Transformable Bottleneck Networks, ICCV 2019<br>[6] Deep Blending for Free-Viewpoint Image-Based-Rendering, SIGGRAPH Asia 2018<br><div>[7] View Independent Generative Adversarial Network for Novel View Synthesis, ICCV 2019<br>[8] DeepView: View Synthesis with Learned Gradient Descent, CVPR 2019</div>[9] Few-Shot Adversarial Learning of Realistic Neural Talking Head Models, ICCV 2019<br>[10] Neural Voice Cloning with a Few Samples, NeurIPS 2018<br>[11] Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis, NeurIPS 2018</div></div></div></div></div></div></div></div></div></div></div></div></div>