I have a video of a patient doing cognitive tasks. The goal is to take each frame of the video, do face detection and landmark the mouth, then calculate the area bounded by the mouth landmarks. I used dlib and its python API to do this.
I ran into two problems. First, the patient is in a hospital bed and the camera view is slightly angled upward, looking up at the chin instead of directly at the face. The face isn't being detected in a decent number of frames, and if there is no face detected, the algorithm doesn't try to do landmarking so there are no mouth perimeter points. I am wondering if there is a way I can improve the face detection (maybe training the object detection specifically on a few frames of the patient?)
The second problem is that from frame to frame, the mouth landmarks can vary pretty significantly. I was hoping that at the end of this I could show a slow motion word being spoken and the mouth perimeter smoothly increasing and decreasing as the mouth opened and closed. But, a result is quite noisy with a good bit of variation.
I'm definitely method/platform agnostic. If anyone knows of a better, more accurate or robust way to do this, maybe with Matlab or OpenCV, I am open to chasing the lead. Any guidance would be helpful.
Thanks, everyone.