Gaussian Mixture Model for Background Subtraction

Question

This is more like something that I would like to discuss with the community rather than something that I am seeking for an absolute answer.

I am trying to implement the GMM based background subtraction algorithm from scratch. Apparently, OpenCV already had it well-implemented (the MOG2). I am still trying to implement it from scratch as I would like to test some of the parameters that the OpenCV does not provide access to. However, my implementation was super slow when running on 4k images and took a huge amount of memory while OpenCV can achieve about 5-10 images per second or even faster and does not take much memory. I am NOT surprised that the OpenCV was much faster than mine but still curious about how it was achieved.

So here are my thoughts:

The GMM approach is to build a mixture of Gaussians to describe the background/foreground for each pixel. That been said, each pixel will have 3-5 associated 3-dimensional Gaussian components. We can simplify the computation by using a shared variance for different channels instead of the covariance. Then we should have at least 3 means, 1 variance, and 1 weight parameters for each Gaussian component. If we assume each pixel would maintain 3 components. This would be roughly 4000*2000*3*(3+1+1) parameters when reading an image.
The computation for updating the GMM, although it is not very complex for a single pixel, the total amount of time for computing the whole 4000*2000 pixels should still be very expensive.
I don't think the OpenCV MOG2 was accelerated by CUDA as I tested on my mac without a graphic card. The speed was still fast.

So my question is:

Does the OpenCV compress the image before feeding it into the model and decompress the results at return?
Is it possible to achieve near real-time processing for 4k images (without image compression) with parallelization on CPU?
My implementation used 4000*2000 double linked lists for maintaining the Gaussian Components for the 4k images. I was expecting that it should save me some memory, but the memory still exploded when I tested it on the 4k image.

Plus:

I did test the OpenCV MOG2 on the resized image ((3840, 2160) down to (384, 216)) and the detection seems acceptable.

This might be a weird question... But I would appreciate any opinions on it.

*Opinions* are not the strong suit of this site. In fact, there's a close reason for questions seeking opinions. OpenCV is open-source, did you check the source code? You might learn a lot from reading that. Finally, linked lists are terrible for memory and access times. Use fixed-sizes arrays if that's possible. In this case, it is. Each double-linked list item is individually allocated, and contains two pointers that are pure overhead. — Cris Luengo, Oct 10 '19 at 21:40

Gaussian Mixture Model for Background Subtraction

0 Answers0