The process of recording with real human camera movements for the purpose of retrieving those direct movements in a computer-generated environment is called Motion Capture.
This process also use in game development for creating animation of players and other characters. This can be improve with the help of Deep Learning.
In this article, I want to share a quick overview of the recently published NeurIPS paper
“First Order Motion Model for Image Animation” written by A. Siarohin et. al. and indicated how its application to the Game Graphics Department will 'change the game'.
It was much later in 2011 when the L.A. Noire game came out with amazing face-like facial expressions that seemed to precede the rest of the game. Now, almost a decade later, we still don’t see many more games approaching any comparison to its level by bringing a real facial expression.
MotionScan technology used by RockStar Studios in the 2011 in a game L..A. Noire for creating real player like facial animations. [source].
This is because the facial scanning technology used in the game's design, called MotionScan, was extremely expensive and the image file sizes were very large, which is why it made it difficult for many publishers to use this technology in their games.
However, this may change soon due to recent developments in the take-up of actions driven by Deep Learning.
First Order Motion Model for Image Animation
In this research project, the authors introduced a Deep Learning Framework for animation from a face source image, following the movement of the other face in a driving video, similar to MotionScan technology.
They suggest a self-paced training method that can use a database of video labels for a particular category to learn the vital forces that define movement. Next, then demonstrate how these motion pictures can be combined with a still image to produce a motion video.
Framework (Model Architecture)
Let's take a look at the construction of this Deep Learning Framework in the picture below. Contains Motion Module and Appearance Module. Driving video is for installation in Motion Module and Source Image is our target for installation in Appearance Module.
The Motion Module contains a codec that reads a hidden representation that contains the most important key points in relation to the movement of the object, which is the face in this case.
The movement of these key points in the various frames of the video drive creates a motion field, driven by the work we want our model to learn. Authors use Taylor Expansion to balance this function in the first order of the old movement field.
According to the authors, this is the first time the first order approximation has been used to model movement. In addition, the studied affine modifications of these key points were combined to produce Dense Motion Field. The Dense motion field predicts the movement of all frames pixels, as opposed to focusing only on key points in the sparse motion field.
Next, the motion module also generates the Occlusion Map, highlighting the pixels of the frame that need to be painted, resulting from the movement of the head w.r.t. background.
For encoding the source image the Appearance Module uses the encoder, which is integrated with Motion Field and the Occlusion Map to animate the source image. The Generator model is used for this purpose. During the practice process supervised by you, a standing frame from the driving video is used as the source image and the readable motion field is used to animate this source image.
Real video frames serve as a ground truth for generated motion, which is why it is an independent(self-supervised) training. During the test / installation phase, this source code can be replaced with any other image from the same object category and do not have to come from the driving video.
Running the Trained Model on Game Characters
I wanted to test how well this model works on the virtually well-designed face of the game characters. The authors shared their code and the easy-to-use Google Colab notebook to explore this. Here's what their trained model looks like when tested on different characters of the game The Grand Auto Auto.
As you can see, it is extremely easy to create life-like animations with this AI, and I think it will be used by almost every game artist for creating facial animations in games. Moreover, in order to perform Mo-Cap with this technique, all we need now is one camera and an average computer with a GPU and this AI will take care of the rest, making it extremely cheap and feasible for game animators to use this tech on a large scale. This is why I’m excited about the massive improvements that can be brought by this AI in the development of future games.
Like the Blog, then Share it with your friends and colleagues to make this AI community stronger.
To learn more about nuances of Artificial Intelligence, Python Programming, Deep Learning, Data Science and Machine Learning, visit our blog page - https://insideaiml.com/blog