Calculating MSE between numpy arrays

Question

The Scientific Question:

I have lots of 3D volumes all with a cylinder in them orientated with the cylinder 'upright' on the z axis. The volumes containing the cylinder are incredibly noisy, like super noisy you can't see the cylinder in them as a human. If I average together 1000s of these volumes I can see the cylinder. Each volume contains a copy of the cylinder but in a few cases the cylinder may not be orientated correctly so I want a way of figuring this out.

The Solution I have come up with:

I have taken the averaged volume and projected it down the z and x axis (just projecting the numpy array) so that I get a nice circle in one direction and a rectangle in the other. I then take each 3D volume and project every single one down the Z axis. The SNR is still so bad that I cannot see a circle but if I average the 2D slices I can begin to see a circle after averaging a few hundred and it is easy to see after the first 1000 are averaged. To calculate a score of how each volume I figured calculating the MSE of the 3D volumes projected down z against three other arrays, the first would be the average projected down Z, then the average projected down y or x, and finally an array with a normal distribution of noise in it.

Currently I have the following where RawParticle is the 3D data and Ave is the average:

def normalise(array):
    min = np.amin(array)
    max = np.amax(array)
    normarray = (array - min) / (max - min)

    return normarray

def Noise(mag):
    NoiseArray = np.random.normal(0, mag, size=(200,200,200))
    return NoiseArray

#3D volume (normally use a for loop to iterate through al particles but for this example just showing one)
RawParticleProjected = np.sum(RawParticle, 0)
RawParticleProjectedNorm = normalise(RawParticleProjected)
#Average
AveProjected = np.sum(Ave, 0)
AveProjectedNorm = normalise(AveProjected)
#Noise Array
NoiseArray = Noise(0.5)
NoiseNorm = normalise(NoiseArray)


#Mean squared error
MSE = (np.square(np.subtract(RawParticleProjectedNorm, AveProjectedNorm))).mean()

I then repeat this with the Ave summed down axis 1 and then again compared the Raw particle to the Noise array.

However my output from this gives highest MSE when I am comparing the projections that should both be circles as shown below:

My understanding of MSE is that the other two populations should have high MSE and my populations that agree should have low MSE. Perhaps my data is too noisy for this type of analysis? but if that is true then I don't really know how to do what I am doing.

If anyone could glance at my code or enlighten my understanding of MSE I would be super appreciative.

Thank you for taking the time to look and read.

Maybe the a distance measure like `dist = np.sum(A*B)` is suited for your problem with both matrices normalized `A /= np.linalg.norm(A)` — scleronomic, Feb 06 '20 at 10:16
@scleronomic Thank you for your reply. I have tried your suggestion however due to the nature of the data it does create a couple of issues. The data is super noisy so this measure done between my data and a noise array creates a far larger number than the other two values. The second issue is that because I am just multiplying the pixels when my average is projected side on there will always be a lot of pixels of a very high value in a large rectangle as opposed to a smaller circle so the highest value will always be made by multiplying the projections down y/x. — j.t.2.4.6, Feb 06 '20 at 10:45
Maybe [this question](https://stackoverflow.com/questions/8960462/how-can-i-measure-image-noise) on how to measure noise helps — scleronomic, Feb 06 '20 at 14:14
@scleronomic Hey, sorry I meant to reply ages ago but totally forgot. Unfortunately your method didn't work but after discussion with lab and supervisor I think the issue may be this is an impossible solution by standard methods and I may have to play around in Fourier space to upweight certain frequencies first (due to low SNR). In the end I tested the code on other peoples data and it is working as expected on theirs so that is good. Thank you for taking the time to answer though as when I have a chance to do the frequency based filtering in fourier space I will likely use this method . — j.t.2.4.6, Feb 28 '20 at 08:02
No Worries. Sorry to here it didn't work, but to be fair I did not do that much different. Just thought it sounded interesting and wanted to recreate the problem. Hope you will find a solution. All the best. — scleronomic, Feb 28 '20 at 09:02

score 3 · Accepted Answer · answered Feb 06 '20 at 14:05

If I understood your question correctly you want to figure how close your different samples are to the average. And by comparing the samples you want to find the outliers which contain a disoriented cylinder. This fits pretty well to the definition of the L2 norm, so MSE should work here.

I would calculate the average 3D image of all samples and than compute the distance of each sample to this average. Then I would just compare those values.

The idea of comparing the samples to an image of artificial noise is not bad, but I am not sure if a normal distribution and your normalization work out as you planned. I could be apple and oranges. And I don't think it is a good idea to look at projections along different axis, just compare the 3D images.

I made some small tests with a circle in 2D with a parameter alpha which indicates how much noise and how much circle there is in a picture. (alpha=0 means only noise, alpha=1 means only circle`)

import numpy as np
import matplotlib.pyplot as plt

grid_size = 20
radius = 5
mag = 1

def get_circle_stencil(radius):
    xx, yy = np.meshgrid(np.linspace(-grid_size/2+1/2, grid_size/2-1/2, grid_size),
                         np.linspace(-grid_size/2+1/2, grid_size/2-1/2, grid_size))
    dist = np.sqrt(xx**2 + yy**2)
    inner = dist < (radius - 1/2)
    return inner.astype(float)

def create_noise(mag, n_dim=2):
    # return np.random.normal(0, mag, size=(grid_size,)*n_dim)
    return np.random.uniform(0, mag, size=(grid_size,)*n_dim)

def create_noisy_sample(alpha, n_dim=2):
    return (np.random.uniform(0, 1-alpha, size=(grid_size,)*n_dim) + 
            alpha*get_circle_stencil(radius))


fig = plt.figure()
ax = fig.subplots(nrows=3, ncols=3)
np.unravel_index(3, shape=(3, 3))
alpha_list = np.arange(9) / 10
for i, alpha in enumerate(alpha_list):
    r, c = np.unravel_index(i, shape=(3, 3))
    ax[r][c].imshow(*norm(create_noisy_sample(alpha=alpha)), cmap='Greys')
    ax[r][c].set_title(f"alpha={alpha}")
    ax[r][c].xaxis.set_ticklabels([])
    ax[r][c].yaxis.set_ticklabels([])

Than I tried some metrics (mse, cosine similarity and binary cross entropy and looked how they behave for different values of alpha.

def normalize(*args):
    return [a / np.linalg.norm(a) for a in args]

def cosim(a, b):
    return np.sum(a * b)

def mse(a, b):
    return np.sqrt(np.sum((a-b)**2))

def bce(a, b):
    # binary cross entropy implemented from tensorflow / keras
    eps = 1e-7
    res = a * np.log(b + eps)
    res += (1 - a) * np.log(1 - b + eps)
    return np.mean(-res)

I compared NoiseA-NoiseB, Circle-Circle, Circle-Noise, Noise-Sample, Circle-Sample

alpha = 0.1
noise = create_noise(mag=1, grid_size=grid_size)
noise_b = create_noise(mag=1, grid_size=grid_size)
circle_reference = get_circle_stencil(radius=radius, grid_size=grid_size)
sample = create_noise(mag=1, grid_size=grid_size) + alpha * circle_reference

print('NoiseA-NoiseB:', mse(*norm(noise, noise_b)))    # 0.718
print('Circle-Circle:', mse(*norm(circle, circle)))    # 0.000
print('Circle-Noise:', mse(*norm(circle, noise)))      # 1.168
print('Noise-Sample:', mse(*norm(noise, sample)))      # 0.697
print('Circle-Sample:', mse(*norm(circle, sample)))    # 1.100

print('NoiseA-NoiseB:', cosim(*norm(noise, noise_b)))  # 0.741
print('Circle-Circle:', cosim(*norm(circle, circle)))  # 1.000
print('Circle-Noise:', cosim(*norm(circle, noise)))    # 0.317
print('Noise-Sample:', cosim(*norm(noise, sample)))    # 0.757
print('Circle-Sample:', cosim(*norm(circle, sample)))  # 0.393

print('NoiseA-NoiseB:', bce(*norm(noise, noise_b)))    # 0.194
print('Circle-Circle:', bce(*norm(circle, circle)))    # 0.057
print('Circle-Noise:', bce(*norm(circle, noise)))      # 0.111
print('Noise-Circle:', bce(*norm(noise, circle)))      # 0.636
print('Noise-Sample:', bce(*norm(noise, sample)))      # 0.192
print('Circle-Sample:', bce(*norm(circle, sample)))    # 0.104

n = 1000
ns = np.zeros(n)
cs = np.zeros(n)
for i, alpha in enumerate(np.linspace(0, 1, n)):
    sample = create_noisy_sample(alpha=alpha)
    ns[i] = mse(*norm(noise, sample))
    cs[i] = mse(*norm(circle, sample))

fig, ax = plt.subplots()
ax.plot(np.linspace(0, 1, n), ns, c='b', label='noise-sample')
ax.plot(np.linspace(0, 1, n), cs, c='r', label='circle-sample')
ax.set_xlabel('alpha')
ax.set_ylabel('mse')
ax.legend()

For your problem I would just look at the comparison circle-sample (red). Different samples will behave as if they have different alpha values and you can group them accordingly. And you should be able to detect outliers because they should have an higher mse.

You said you have to combine 100-1000 pictures to see the cylinder, which indictas you have an really small alpha value in your problem, but in average mse should work.

Thank you so much for your reply I am going to try and implement this tomorrow when not in the lab. — j.t.2.4.6, Feb 07 '20 at 11:30

Calculating MSE between numpy arrays

1 Answers1