I need to implement face detection on mp4 videos, such that the output of my program is a "timeline" indicating the number of faces.
I have looked at the google-vision examples , specifically photo demo and face Tracker.
Following the photo demo example, I've created a wrapper which loops over the frames and: 1) extracts thumbnails from the video using
MediaMetadataRetriever.getFrameAtTime()
, 2) creates frames usingFrame.Builder().setBitmap(bitmap).build()
, and 3) detects the number of faces withFaceDetector.detect(frame).size()
. This method works, but it is painfully slow (e.g., 1 second per frame).I also looked at the face tracker example, which looks a lot more like what I need (and is the suggested approach for videos and camera). The problem here is that the example is tightly associated with the camera.
I also read a similar thread
where the accepted answer looks like my first attempt, but mentions MediaCodec. I've read about it but cant find a way to apply it to my problem (even the examples from bigflake)
As I understand, my options are a) improve the frame extraction step (e.g., using Mediacodec?), or b) mimic the CameraSource functionality but using the mp4 file instead of the actual camera.