FACE.DJ builds animation-ready 3D faces from photos by using state-of-the-art technologies in Machine Learning, Computer Vision and Computational Geometry.
We require only one frontal face image to create a textured 3D face.
Facial geometry and texture is then extrapolated to a full head model.
This 3D face model can then be animated by capturing the user’s face via video stream by utilizing face detection, face landmarks tracking, and gaze estimation modules of LUNA SDK. Thus, facial expressions of the user are transmitted and mimicked by the user-created avatar in real-time.
Eyes and the oral cavity are also modeled as separate objects and can be used to enhance the avatar representation.
Morphable models are essential for many algorithms performing 3D face reconstruction. These algorithms may be classified into two types: optimization-based and regression-based.
Optimization-based algorithms directly employ the use of a morphable model.
Regression-based algorithms often use the results of morphable model fitting as training data.
Morphable models allow for face generation in three dimensions. The faces are controlled by relatively few parameters that adjust their shapes and expressions. These faces may be rendered from varying perspectives and under different lighting conditions.
Morphable model fitting reverts the process of image rendering: it seeks to find a combination of parameters that will result in a rendered image resembling the target image as closely as possible.
We process a face database using a morphable model fitting algorithm, thus acquiring reconstructed 3D models, and train a neural network to predict facial geometry.
We use an adaptation of the technique known as the Laplacian Surface Editing in combination with computer vision techniques to improve the alignment and appearance of salient face parts, especially the nose tip.
To animate an avatar in real time we use face and eyes landmarks. For landmarks detection we use extremely fast neural network models optimized for real-time inference on mobile devices.
To better capture subtle details of face and eyes movement we have created a large in-house training set with a variety of human emotions.
Accurate head pose and gaze estimation is a challenge mainly due to the difficulties in obtaining large labelled training sets with all kinds of variations in head pose and environment conditions.
We estimate head position and eye pupil position directly from predicted landmarks. This approach allows us to create robust estimators that work in a wide range of conditions in under 10ms on a mobile device.
We conduct user studies with the help of a Mechanical Turk-like service. Users see an original photo and two avatars constructed from the photo. They are then asked to select their favorite of the two and leave some feedback in the form of comments. This helps us to monitor how our avatars are perceived by users.