I have a dataset made up of images of faces, with the corresponding landmarks that make up the mouth. These landmarks are sets of 2D points (x,y pixel position). Each image-landmark set data pair is tagged as either a smile, or neutral.
What i would like to do is train a deep learning model to return a smile intensity for a new image-landmark data pair.
What should I be searching for to help me with the next step? Is it a CNN that i need? In my limited understanding, the usual training input is just an image, where I would be passing the landmark sets to train with. Or would an SVM approach be more accurate?
I am looking for maximum accuracy, as much as is possible.
What is the approach that I need called?
I am happy to use PyTorch, Dlib or any framework, I am just a little stuck on the search terms to help me move forward.
Thank you.