0

I have a large amount of images and want to calculate the variance (of each channel) across all of them. I'm having problem finding an efficient algorithm / setup for that.

I read on of the Welford's online algorithm but it is way to slow as it is does not vectorize across a single image or a batch of images. So I'm wondering how to improve the speed of it to by using vectorization or making use of inbuilt variance algorithms.

Daraan
  • 1,797
  • 13
  • 24
  • IMHO you are unlikely to do significantly better than using **OpenCV** `cv2.meanStdDev()` https://docs.opencv.org/4.x/d2/de8/group__core__array.html#ga846c858f4004d59493d7c6a4354b301d And then couple that with multi-processing. – Mark Setchell Feb 26 '23 at 11:08

1 Answers1

0

These are the two functions needed to update/combine the mean and variances of two batches. Both functions can be used with vectors (the 3 color channels) and the mean and variance can be acquired from inbuilt methods like batch.var().

Equations taken from: https://notmatthancock.github.io/2017/03/23/simple-batch-stat-updates.html

   
# m amount of samples (or pixels) over all previous badges
# n amount of samples in new incoming batch
# mu1 previous mean
# mu2 mean of current batch
# v1 previous variance
# v2 variance of current batch

def combine_means(mu1, mu2, m, n):
    """
    Updates old mean mu1 from m samples with mean mu2 of n samples.
    Returns the mean of the m+n samples.
    """
    return (m / (m+n)) * mu1 + (n/(m+n))*mu2

def combine_vars(v1, v2, mu1, mu2, m, n):
    """
    Updates old variance v1 from m samples with variance v2 of n samples.
    Returns the variance of the m+n samples.
    """
    return (m/(m+n)) *v1 + n/(m+n) *v2 + m*n/(m+n)**2 * (mu1 - mu2)**2
    
Daraan
  • 1,797
  • 13
  • 24