-1

I have a dataset made up of images of faces, with the corresponding landmarks that make up the mouth. These landmarks are sets of 2D points (x,y pixel position). Each image-landmark set data pair is tagged as either a smile, or neutral.

What i would like to do is train a deep learning model to return a smile intensity for a new image-landmark data pair.

What should I be searching for to help me with the next step? Is it a CNN that i need? In my limited understanding, the usual training input is just an image, where I would be passing the landmark sets to train with. Or would an SVM approach be more accurate?

I am looking for maximum accuracy, as much as is possible.

What is the approach that I need called?

I am happy to use PyTorch, Dlib or any framework, I am just a little stuck on the search terms to help me move forward.

Thank you.

anti
  • 3,011
  • 7
  • 36
  • 86

1 Answers1

1

It's hard to tell without looking into the dataset and experimenting. But hopefully, the following research materials will guide you in the right direction.

Now, I'm assuming you don't have any label for actual smile intensity.

In such a scenario, the existing smile detection methods can be used directly, you'll use the last activation output (sigmoid) as a confidence score for smiling. If the confidence is higher, the intensity should be higher.

Now, you can use the facial landmark points as separate features (pass them through an LSTM block) and concatenate to the CNN at an early stage/ or later to improve the performance of your model.

If you have the label for smiling intensity, you can just solve it as a regression problem, the CNN will have one output, will try to regress the smile intensity (the normalized smile intensity with sigmoid in this case).

Zabir Al Nazi
  • 10,298
  • 4
  • 33
  • 60
  • Thank you! You are correct, I don't have intensity labels. So I can pass the images AND the landmarks to a CNN? Or just the landmarks? – anti May 01 '20 at 19:55
  • First, try with the images only, follow the projects I mentioned, you'll find some code for example: https://github.com/meng1994412/Smile_Detection and use the sigmoid output for intensity, once you want to improve you can design complex model with landmarks, designing such model will require the understanding of different layers. – Zabir Al Nazi May 01 '20 at 20:00
  • here are some more projects: https://github.com/topics/smile-detection – Zabir Al Nazi May 01 '20 at 20:00