Efficient algorithm for online Variance over image batches

Question

I have a large amount of images and want to calculate the variance (of each channel) across all of them. I'm having problem finding an efficient algorithm / setup for that.

I read on of the Welford's online algorithm but it is way to slow as it is does not vectorize across a single image or a batch of images. So I'm wondering how to improve the speed of it to by using vectorization or making use of inbuilt variance algorithms.

IMHO you are unlikely to do significantly better than using **OpenCV** `cv2.meanStdDev()` https://docs.opencv.org/4.x/d2/de8/group__core__array.html#ga846c858f4004d59493d7c6a4354b301d And then couple that with multi-processing. — Mark Setchell, Feb 26 '23 at 11:08

score 0 · Accepted Answer · answered Feb 26 '23 at 11:01

These are the two functions needed to update/combine the mean and variances of two batches. Both functions can be used with vectors (the 3 color channels) and the mean and variance can be acquired from inbuilt methods like batch.var().

Equations taken from: https://notmatthancock.github.io/2017/03/23/simple-batch-stat-updates.html

   
# m amount of samples (or pixels) over all previous badges
# n amount of samples in new incoming batch
# mu1 previous mean
# mu2 mean of current batch
# v1 previous variance
# v2 variance of current batch

def combine_means(mu1, mu2, m, n):
    """
    Updates old mean mu1 from m samples with mean mu2 of n samples.
    Returns the mean of the m+n samples.
    """
    return (m / (m+n)) * mu1 + (n/(m+n))*mu2

def combine_vars(v1, v2, mu1, mu2, m, n):
    """
    Updates old variance v1 from m samples with variance v2 of n samples.
    Returns the variance of the m+n samples.
    """
    return (m/(m+n)) *v1 + n/(m+n) *v2 + m*n/(m+n)**2 * (mu1 - mu2)**2

Efficient algorithm for online Variance over image batches

1 Answers1