How to convert Mediapipe Face Mesh to Blendshape weight

Question

I would like to apply facial motions which are predicted by Mediapipe Face Mesh into 3D models using blendshape.

Target 3D models have blendshapes like ARFaceAnchor.BlendShapeLocation of iOS.

I should convert Face landmark to blendshape weight.
To make this happen, I guess that I should check positions of landmark and calculate distance between them and position of calibrated vertices.
But maybe it require fine tuning and is lacking in versatility.
According to this paper, Google has a model for it but unfortunally they will not publish this model(here).

Could you teach me any good idea?
Or if you know already published logic that seems to be helpful, I would like to know it.

I am having the same question. I want to get the blendshapes using an android device. It would be very interesting if there is a mathematical formula to convert the 470 landmarks into 52 blendshapes — chrisrn, Jul 28 '21 at 09:28
There won't be any particular mathematical formula since all blendshapes are arbitrary. But you could find your own particular ones, and one of the easiest way to do that is through machine learning. — C Lu, Feb 04 '22 at 01:56

qhanson · Answer 1 · 2022-11-21T02:19:54.447

2

Blendshape generation can be divided into two methods:

Direct math from mesh landmarks:

kalidokit, https://github.com/yeemachine/kalidokit
Mefamo, https://github.com/JimWest/MeFaMo

AI model:

mocap4face, https://github.com/facemoji/mocap4face
AvatarWebKit, https://github.com/Hallway-Inc/AvatarWebKit

With the rapid development of supervised learning, collecting face and 52-bs paired datasets seems the best way to solve this problem.

==== update 2022.11.21 =====

NVIDIA has released maxine-ar-sdk to compute face blendshapes. The predicted blendshpaes are slightly different from Arkit 52. I have successfully compiled it and run it well on windows with RTX-20 or RTX-30 cards.

If anyone really needs one mediapipe-based solution, just comments. I can contribute to label CC face datasets for fine-tuning your own models with NMAXINE-AR-SDK.

edited Nov 21 '22 at 02:19

answered Aug 25 '22 at 02:50

qhanson

21
4

AI model - both would not work out as Hallway has stopped giving out sdks and alter will be stopping mocap4face by nov'22 – Aditya Aug 26 '22 at 19:59
These AI models belong to one kind of Intellectual property (IP). There should be one solution for cameras without true-depth ability if the customers need it. – qhanson Aug 28 '22 at 06:09
If you don't mind could you let us know about that solution? – Aditya Aug 28 '22 at 09:00
1

@Aditya NVIDIA has released one SDK to predict face blendshapes. – qhanson Nov 21 '22 at 02:21

score 1 · Answer 2 · edited Apr 03 '22 at 07:55

I'm having the same problem at the moment. My current solution is below:

Quote from paper you cited:

Puppeteering Our model can also be used for virtual pup- peteering and facial triggers. We built a small fully con- nected model that predicts 10 blend shape coefficients for the mouth and 8 blend shape coefficients for each eye. We feed the output of the attention mesh submodels to this blend shape network. In order to handle differences be- tween various human faces, we apply Laplacian mesh edit- ing to morph a canonical mesh into the predicted mesh [3]. This lets us use the blend shape coefficients for different hu- man faces without additional fine-tuning. We demonstrate some results in Figure 5

I think my approach at the moment is pretty much the same as what they've done.

My approach: First sample many pairs of random blendshapes -> face mesh (detecting face mesh on 3D model), and then learning an inverse model from that. (A simple neuronet would do)

Therefore you end up with a model that can give blendshapes given a face mesh.

The catch, which is also mentioned in the above blurb, is that you wanna handle different face mesh inputs. In the above blurb it seems that they sample the 3D model but transform the sampled mesh into the canonical face mesh, and hence end up with a canonical inverse model. At inference you transform a given mesh into the canonical face mesh as well.

Another solution might be to directly transform your different people's face meshes into the 3D model's mesh.

I haven't yet done the canonical mesh part, but the step one should work.

Best regards, C

Please use the info from https://stackoverflow.com/editing-help and opt for a less noisy formatting. — Yunnosch, Apr 03 '22 at 07:54
Hi iPsych, sorry for the late reply. Unfortunately I don't have the example on github as it is work done for company. But let me know of any specific questions and I'm happy to help :) — C Lu, Oct 06 '22 at 22:14

score 0 · Answer 3 · answered Aug 04 '23 at 08:46

Mediapipe now has blendshape generation natively as part of their Face landmark library.

No need to do this conversion yourself anymore.

Documentation can be found here: https://developers.google.com/mediapipe/solutions/vision/face_landmarker

Make sure to set the option output_face_blendshapes to true.

How to convert Mediapipe Face Mesh to Blendshape weight

3 Answers3

Linked