1

My python code is as follows...it takes forever. There must be some numpy tricks I can use? The picture I am analyzing is tiny and in grayscale...

def gaussian_probability(x,mean,standard_dev):
        termA = 1.0 / (standard_dev*np.sqrt(2.0*np.pi))
        termB = np.exp(-((x - mean)**2.0)/(2.0*(standard_dev**2.0)))
        g = (termA*termB)
        return g
def sum_of_gaussians(x):
        return sum([self.mixing_coefficients[i] * 
                    gaussian_probability(x, self.means[i], self.variances[i]**0.5) 
                    for i in range(self.num_components)])
def expectation():
        dim = self.image_matrix.shape
        rows, cols = dim[0], dim[1]
        responsibilities = []
        for i in range(self.num_components):
            gamma_k = np.zeros([rows, cols])
            for j in range(rows):
                for k in range(cols):
                    p = (self.mixing_coefficients[i] *      
                         gaussian_probability(self.image_matrix[j,k],
                                              self.means[i], 
                                              self.variances[i]**0.5))
                    gamma_k[j,k] = p / sum_of_gaussians(self.image_matrix[j,k])
            responsibilities.append(gamma_k)
        return responsibilities

I included only the expectation step, because, while the maximization step loops through every element of the responsibility array of matrices, it seems to go relatively quickly (so maybe the bottleneck is all of the gaussian_probability calculations?)

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
Chris
  • 28,822
  • 27
  • 83
  • 158

1 Answers1

4

You can greatly speed up your calculation by doing two things:

  • don't compute the normalization within each loop! As currently written, for an NxN image with M components, you're computing each relevant calculation N * N * M times, leading to an O[N^4 M^2] algorithm! Instead you should compute all the elements once, and then divide by the sum, which will be O[N^2 M].

  • use numpy vectorization rather than explicit loops. This can be done very straightforwardly the way you've set up the code.

Essentially, your expectation function should look something like this:

def expectation(self):
    responsibilities = (self.mixing_coefficients[:, None, None] *
                        gaussian_probability(self.image_matrix,
                                             self.means[:, None, None],
                                             self.variances[:, None, None] ** 0.5))
    return responsibilities / responsibilities.sum(0)

You didn't provide a complete example, so I had to improvise a bit to check and benchmark this, but here's a quick take:

import numpy as np

def gaussian_probability(x,mean,standard_dev):
    termA = 1.0 / (standard_dev*np.sqrt(2.0*np.pi))
    termB = np.exp(-((x - mean)**2.0)/(2.0*(standard_dev**2.0)))
    return termA * termB

class EM(object):
    def __init__(self, N=5):
        self.image_matrix = np.random.rand(20, 20)
        self.num_components = N
        self.mixing_coefficients = 1 + np.random.rand(N)
        self.means = 10 * np.random.rand(N)
        self.variances = np.ones(N)

    def sum_of_gaussians(self, x):
        return sum([self.mixing_coefficients[i] * 
                    gaussian_probability(x, self.means[i], self.variances[i]**0.5) 
                    for i in range(self.num_components)])

    def expectation(self):
        dim = self.image_matrix.shape
        rows, cols = dim[0], dim[1]
        responsibilities = []
        for i in range(self.num_components):
            gamma_k = np.zeros([rows, cols])
            for j in range(rows):
                for k in range(cols):
                    p = (self.mixing_coefficients[i] *      
                         gaussian_probability(self.image_matrix[j,k],
                                              self.means[i], 
                                              self.variances[i]**0.5))
                    gamma_k[j,k] = p / self.sum_of_gaussians(self.image_matrix[j,k])
            responsibilities.append(gamma_k)
        return responsibilities

    def expectation_fast(self):
        responsibilities = (self.mixing_coefficients[:, None, None] *
                            gaussian_probability(self.image_matrix,
                                                 self.means[:, None, None],
                                                 self.variances[:, None, None] ** 0.5))
        return responsibilities / responsibilities.sum(0)

Now we can instantiate the object and compare the two implementations of the expectation step:

em = EM(5)
np.allclose(em.expectation(),
            em.expectation_fast())
# True

Looking at the timings, we're about a factor of 1000 faster for a 20x20 image with 5 components:

%timeit em.expectation()
10 loops, best of 3: 65.9 ms per loop

%timeit em.expectation_fast()
10000 loops, best of 3: 74.5 µs per loop

This improvement will grow as the image size and number of components increase. Best of luck!

jakevdp
  • 77,104
  • 11
  • 125
  • 160
  • 1
    Haha, just looked at it again...can't believe I did the normalization inside the loop. Thanks man! – Chris Nov 14 '15 at 14:10
  • does the `responsibilities.sum(0)` take the sum across the values? for instance, each pixel needs to be normalized according to each component... so `responsibility[0][x,y] = responsibility[0][x,y] / (responsibility[0][x,y] + [1][x,y] + [2][x,y], etc...)` – Chris Nov 14 '15 at 14:34
  • Yes, that's exactly what it does: it sums across the zeroth axis, which spans the components. – jakevdp Nov 14 '15 at 14:36