I am trying to detect faces in local video and blur or pixelate them while the video is playing. So far I am getting each video frame and using VISION or MLKIT (I've tried both) I process the frame and pixelate the face. Now the problem is that this process is taking too long and the video is not even played. My idea was to first process all the frames and the export it as a new video and play it but this process takes around 3-4 minutes which is too much waiting.
This is more less how I am getting the frames: Link
And here is how I pixelate the image: Link
What can I do to detect faces and blur them from a local video without taking so much time?