How motion capture brings virtual idols to life

  • Motion capture refers to the technique of recording and processing the movement of a person or other object,
  • Motion capture technology is used in a large number of great animated films such as Avatar, The Avengers, and The Lord of the Rings.

OUR TAKE
Motion capture technology is widely used in many fields, in film and television animation, game field can provide users with more natural and intuitive interactive experience. At the same time, it greatly improves the production efficiency, reduces the cost, and makes the animation effect more vivid and real. In the era of diversified leisure and entertainment, motion capture technology is a unique and effective way to promote the development of the industry. In addition, in the field of medical treatment and rehabilitation, sports training and biomechanical research, motion capture technology plays an important role.
— Iydia Ding, BTW reporter

Motion capture refers to the technique of recording and processing the movement of a person or other object,combining image tracing techniques and new computer technology, making it possible to use live, continuous footage as the basis for animation without having to go through the drawing process. In recent years, motion capture has begun to make a splash in the film and television animation, and gaming industries. The real breakthrough came when “the Lord of the Rings Trilogy” completely created the character of “Gollum” through motion capture, revolutionising the way people viewed the technology.

In addition to applications in film and television animation and games, motion capture technology is also used in virtual reality, human-machine effects, rehabilitation medicine, ship posture detection medical and rehabilitation, sports training and biomechanical research, etc., and with the continuous development of science and technology, there must be more expandable development direction in the future.

“If we do to do 200 minutes of animation with motion capture, we may only need about 2-3 days of recording, probably less than 1 month, to achieve the same, or even more realistic, effect as if the animator had made the motion manually.”

Kevin Wang, Lead Motion Technician, Photonics Technology Centre, Tencent Interactive Entertainment

Concepts and applications of motion capture

Basic concepts of motion capture

Motion capture originated from rotoscoping, which was used in Disney’s early 2D animated film “Snow White” and the game “Prince of Persia“. Today’s motion capture refers more to wearable motion capture technologies, such as optical motion capture and inertial motion capture. Motion capture specialists install various sensors on actors. These track and record their movements so that they can be mapped in real time as a virtual “skeleton” on a computer screen.

The use of wearable motion capture equipment to obtain the motion data of the actor’s body can build a fine three-dimensional motion trajectory, and is widely used in military, entertainment, sports, medical, robotics and other fields, is an important research method in ergonomics and biomechanics related research.

Also read: Meta plans to bring generative AI to metaverse games


Pop Quiz

Which technology does motion capture combine?

A) image tracing techniques and new computer technology

B) Integrated Sense of Communication (ISAC) technology

C) Natural language processing (NLP)

D) Blockchain technology and Quantum computing

The correct answer is at the bottom of the article.


History of motion capture technology

As the concept of “meta-universe” becomes more popular, the long-term value of motion capture to the meta-universe is also becoming clear, it is at the same level as the engine, transmission, computation and display technologies, and it is an important piece in the “huge  puzzle” of meta-universe’s underlying construction.

Motion-capture-like technology first appeared in 1915, when animator Max Fleischer built a projector that displayed the contents of a film on a translucent stage. With this projector, the animator could easily draw the character’s movements as they would appear on the screen.

In 1983, Tom Calvert of Canada’s Simon Fraser University made a major breakthrough in physical mechanical capture clothing, a technology that allowed people to see the earliest mechanical class capture. At the same time, MIT also introduced a set of LED-based “graphical marionetter” system, which was the prototype of the early optical motion capture system.

At the end of 1990, the filming of “The Lord of the Rings” was the first time to bring the motion capture filming steps to the shooting site, and Andy Serkis, the pioneer of motion capture actor, could interact with other actors as “Gollum”, which was more conducive to the shaping of the character and made the character more flesh and blood, alive and well. .

Today, motion capture is almost standard in large game studios. With motion capture, live action and animated characters are synchronised, and the game characters will appear more realistic and vivid. This is why we can see cinematic level action performances in games.

“It is on the same level with engine, transmission, computing and display technologies, and is an important piece of the “huge puzzle” in the underlying construction of the metaverse.”

MetaPost Tencent Metaverse Technology Media

Application areas of motion capture

In recent years, motion capture technology has been widely used in the film and game industry, such as the famous film “Avatar”, “Rise of the Planet of the Apes”, “Assassin’s Creed”, “Detroit: The Changeling” and so on, all of which are used to capture motion capture data from motion capture actors to drive virtual characters. As motion capture data is completely based on the human body on the collection, the reconstruction of the action can be maximally restored to the human body posture and movement effects, while maintaining a natural and smooth, so that the modern motion capture technology can greatly enhance the expressive power of the virtual characters.

The film Avatar, released in 2009, can be considered a leader in the successful combination of motion capture and expression capture technology. Director James Cameron and team used head-mounted facial capture cameras and built the largest filming and motion capture studio ever.

Special effects film and television production and gaming have never been separate, and someone soon brought the concept of motion capture to the gaming world. The most pioneering in this field was Sega, which at the time was a three-way tie with Nintendo and Sony in the console field. In addition, there are some virtual idol groups, such as “Project SEGA” , “Aikatsu” , also use this technology, the use of motion capture and 3D animation combination, to create more possibilities on the stage, timely feedback on the fan’s interaction, to increase the sense of experience. What’s more, some VR games allow players to become a member of the virtual world through motion capture technology, and have a communication with non-player character (NPC) in the “real sense” .

In addition, motion capture technology is widely used in the military, entertainment, sports, medicine, robotics, and many other fields, and is an important research method in ergonomics and biomechanics related research.

Also read: Nintendo says it won’t use generative AI to make games

Wearable motion capture vs AI Video motion capture

With the maturity of the technology, the application of motion capture technology has become more and more extensive, from animation, human-computer interaction, to remote control of robots, sports training and so on, all will use motion capture technology.

In the face of different scenarios, motion capture technology has also appeared a variety of technical routes, the common optical motion capture technology, inertial motion capture technology and visual motion capture technology.

Technical principle of wearable motion capture

Wearable motion capture mainly refers to optical motion capture and inertial motion capture, optical motion capture is through the tracking of markers on the optical motion capture suit, synchronised with the marking data under the camera of different viewpoints, the use of 3D reconstruction algorithms to rebuild different parts of the human body’s motion data; and inertial motion capture is through the recording of inertial sensors on the wearable equipment to get the motion data, and finally reconstructed through the software of the three-dimensional motion trajectory and converted into skeletal animation, so as to drive the virtual character. AI video motion capture is used to drive virtual characters.

Technical principle of AI video motion capture

With the development of deep learning, the accuracy of monocular images for tasks such as human key point detection and human posture prediction has been greatly improved. Meanwhile, with the release of parametric human models such as SMPL, it has become possible to predict human skeletal poses with masks directly from a single image.

AI video motion capture is to extract multiple single-frame images from the video, use AI algorithms to extract human skeletal poses from different frame images separately, and connect the human skeletal poses in chronological order to form skeletal animation data, which can be used to drive virtual characters.

Two types of algorithms for AI motion capture

Mainstream AI motion capture algorithms are based on parametric human body models, such as SMPL/SMPL-X, which are mainly divided into two categories.

Optimisation-based algorithms: These algorithms predefine some optimisation objective functions, which are usually composed of reprojection error, a priori regular terms for human posture, etc. In prediction, the 2D key points, such as joint positions of knees, elbows, shoulders, etc., are detected by manual annotation or AI algorithms, and then the optimisation algorithm iteratively finds a set of parameterised human body model parameters with optimal values of objective functions to represent the human skeleton poses in the current picture. Parameters to represent the human skeletal pose in the current picture, this type of algorithms is represented by SMPLify, SMPLify-X, etc.

Data-driven algorithms: these algorithms require a training dataset containing a large number of pictures and corresponding skeletal pose data obtained by modern motion capture techniques, and in the training phase, a deep neural network is trained to directly regress the ground truth of the training dataset; in the prediction phase, the trained deep neural network directly predicts a set of parameterised human body model parameters from the picture features. are represented by HMR, VIBE, PyMAF and so on.

These two types of algorithms have their own advantages and disadvantages. Optimisation-based algorithms can fit the poses of limb ends and other parts of the body better with higher accuracy, but require more precise 2D keypoints.

In addition due to the fact that there are many suboptimal solutions to this optimisation problem and it is highly influenced by the initialisation, it is easier to fit distorted or unnatural human poses even with the constraints of the human pose a priori. Data-driven algorithms trained deep neural networks from a large amount of data are less prone to generating distorted poses, but usually have poorer predictions at the end of the limb, such as the foot, that do not match the pose in the image.

In recent years, more and more algorithms use the combination of the two, firstly, the data-driven algorithm predicts the human body pose that is closer to the picture, and uses it as the initialisation of the optimisation-based algorithm, so that while adjusting the pose to improve the accuracy, it also avoids distorted or unnatural human body poses, and the meta-image solution also adopts this combination-based algorithm. However, if we simply combine the two algorithms, the quality of the captured skeletal animation data is still relatively low.

Also read: Could video games become the world’s favourite entertainment?

Something about face motion capture

Before the earliest days, face motion capture was done by having the actor sit in a chair with about 30 or 40 cameras in front of him, and many tiny reflective balls were placed on the face. This is the way the Arctic Express went about it, for example.

This is a very time wasting method, and as times change and technology grows, there are now so-called head-mounted helmets to do facial motion capture. This helmet has a tiny camera on the front that records all the expressions on your face, after which we can synchronise with the body data to achieve a very good face and body motion capture effect.

Back in 2019 Meta announced its virtual human avatar system, which featured 3D motion-capture technology to recreate the image of a real person through a VR device, rendering details such as skin colour, texture, hair, micro-expressions, etc. Meta hopes that in the future, people will meet in virtual environments as real as they are in reality.

In Youtube, TikTok and some other social media platforms, there is no lack of on the blogger facial motion capture technology, control of some of the virtual characters to make rich expressions and movements, the effect of live broadcasting is quite good, and now there are a number of mobile phone apps have been able to do through the camera, to do a fairly accurate facial motion capture. So theoretically, everyone can have multiple avatars and live a life in the virtual world that is completely different from reality through this type of technology.


The correct answer is A image tracing techniques and new computer technology.

Iydia-Ding

Iydia Ding

Iydia Ding is a intern reporter at BTW Media covering products. She studing at Shanghai International Studies University. Send tips to i.ding@btw.media.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *