This is more like something that I would like to discuss with the community rather than something that I am seeking for an absolute answer.
I am trying to implement the GMM based background subtraction algorithm from scratch. Apparently, OpenCV already had it well-implemented (the MOG2). I am still trying to implement it from scratch as I would like to test some of the parameters that the OpenCV does not provide access to. However, my implementation was super slow when running on 4k images and took a huge amount of memory while OpenCV can achieve about 5-10 images per second or even faster and does not take much memory. I am NOT surprised that the OpenCV was much faster than mine but still curious about how it was achieved.
So here are my thoughts:
The GMM approach is to build a mixture of Gaussians to describe the background/foreground for each pixel. That been said, each pixel will have 3-5 associated 3-dimensional Gaussian components. We can simplify the computation by using a shared variance for different channels instead of the covariance. Then we should have at least 3 means, 1 variance, and 1 weight parameters for each Gaussian component. If we assume each pixel would maintain 3 components. This would be roughly 4000*2000*3*(3+1+1) parameters when reading an image.
The computation for updating the GMM, although it is not very complex for a single pixel, the total amount of time for computing the whole 4000*2000 pixels should still be very expensive.
I don't think the OpenCV MOG2 was accelerated by CUDA as I tested on my mac without a graphic card. The speed was still fast.
So my question is:
- Does the OpenCV compress the image before feeding it into the model and decompress the results at return?
- Is it possible to achieve near real-time processing for 4k images (without image compression) with parallelization on CPU?
- My implementation used 4000*2000 double linked lists for maintaining the Gaussian Components for the 4k images. I was expecting that it should save me some memory, but the memory still exploded when I tested it on the 4k image.
Plus:
- I did test the OpenCV MOG2 on the resized image ((3840, 2160) down to (384, 216)) and the detection seems acceptable.
This might be a weird question... But I would appreciate any opinions on it.