CNN vs SVM for smile intensity detection training?

Question

I have a dataset made up of images of faces, with the corresponding landmarks that make up the mouth. These landmarks are sets of 2D points (x,y pixel position). Each image-landmark set data pair is tagged as either a smile, or neutral.

What i would like to do is train a deep learning model to return a smile intensity for a new image-landmark data pair.

What should I be searching for to help me with the next step? Is it a CNN that i need? In my limited understanding, the usual training input is just an image, where I would be passing the landmark sets to train with. Or would an SVM approach be more accurate?

I am looking for maximum accuracy, as much as is possible.

What is the approach that I need called?

I am happy to use PyTorch, Dlib or any framework, I am just a little stuck on the search terms to help me move forward.

Thank you.

score 1 · Answer 1 · answered May 01 '20 at 19:48

It's hard to tell without looking into the dataset and experimenting. But hopefully, the following research materials will guide you in the right direction.

Machine learning-based approach: https://www.researchgate.net/publication/266672947_Estimating_smile_intensity_A_better_way
Deep learning (CNN): https://arxiv.org/pdf/1602.00172.pdf
A list of awesome papers for smile and smile intensity detection: https://github.com/EvelynFan/AWESOME-FER/blob/master/README.md
SmileNet project: https://sites.google.com/view/sensingfeeling/

Now, I'm assuming you don't have any label for actual smile intensity.

In such a scenario, the existing smile detection methods can be used directly, you'll use the last activation output (sigmoid) as a confidence score for smiling. If the confidence is higher, the intensity should be higher.

Now, you can use the facial landmark points as separate features (pass them through an LSTM block) and concatenate to the CNN at an early stage/ or later to improve the performance of your model.

If you have the label for smiling intensity, you can just solve it as a regression problem, the CNN will have one output, will try to regress the smile intensity (the normalized smile intensity with sigmoid in this case).

Thank you! You are correct, I don't have intensity labels. So I can pass the images AND the landmarks to a CNN? Or just the landmarks? — anti, May 01 '20 at 19:55
First, try with the images only, follow the projects I mentioned, you'll find some code for example: https://github.com/meng1994412/Smile_Detection and use the sigmoid output for intensity, once you want to improve you can design complex model with landmarks, designing such model will require the understanding of different layers. — Zabir Al Nazi, May 01 '20 at 20:00
here are some more projects: https://github.com/topics/smile-detection — Zabir Al Nazi, May 01 '20 at 20:00

CNN vs SVM for smile intensity detection training?

1 Answers1