1

The problem

I have two arrays, we'll call them ar1 and ar2 (size (192,289)), that represent lat-lon maps of standard deviations, and I have an similarly-sized array of their difference. I want to plot the difference, and on top a stippling pattern where the difference between the two arrays is statistically significant to the 95% confidence level (alpha = 0.05).

The code

I was using this example for my coding-

How do I do a F-test in python

I used Joel Cornett's solution, substituting ar1 and ar2 in for X and Y.

F = np.var(ar1) / np.var(ar2)
print np.var(ar1), np.var(ar2)
print F

0.118586507371 0.161485609461 0.734347213766

For the next part, I want N-2 degrees of freedom for my analysis, where N is the number of points in the arrays, in this case 55848 (192 x 289). len(ar1) and len(ar2) won't work here since those only give the length of the first dimension, so I tried flattening the arrays for the correct length.

df1 = len(np.ndarray.flatten(sdmod)) - 2
df2 = len(np.ndarray.flatten(sdcon)) - 2
print df1, df2

55486 55486

However, going forward with this I ended up with a p-value of 9.88365269356e-289 (essentially 0). This is a single value and, as I expected in this particular case, statistically insignificant, but I need an array of values in order to do the stippling so I can see if there's any place on the grid where the difference IS significant. I'm just not sure how to perform this test on a 2-D array since all the examples I'm finding use lists or other 1-D data types, and I also just have never done an analysis like this before. (I'm doing it at the request of my advisor, who doesn't use Python).

The Question

How do you perform an f-test on a two 2-D arrays where the result gives a similarly-sized array that gives you a p-value for each grid point?

I can amend this if possible to fill in anything I might be missing due to lack of understanding of the subject (and let me know is the p-value I got doesn't seem right), but if this it too complex or incomplete to get help on, I'll just delete it.

Cebbie
  • 1,741
  • 6
  • 23
  • 37

1 Answers1

0

It depends on your arrays. In case the step is large enough for the lat/long array to be split in smaller pieces, you could evaluate squares of 24x24 rather than entire array. You can check for different scales and see what makes sense. To implement this try something along the lines of this:

F = np.zeros((derp.shape[0]-24, derp.shape[1]-24))
for a in np.arange(0, F.shape[0]):
    for b in np.arange(0, F.shape[1]):
        F[a,b] = np.var(ar1[a:a+24, b:b+24])/np.var(ar2[a:a+24, b:b+24])

This would yield similar sized output (168,265) array, since the squares of 24x24 are evaluated in the code above the step of 1 might not necessarily make sense anymore. Half overlapping squares would yield more sensible results for small stepsizes:

F = np.zeros(((derp.shape[0]-24)//12, (derp.shape[1]-24)//12))
for a in np.arange(0, F.shape[0]):
    for b in np.arange(0, F.shape[1]):
        F[a,b] = np.var(derp[a*12:a*12+24, b*12:b*12+24])/np.var(derp2[a*12:a*12+24, b*12:b*12+24])

This yields (14,22) array.

zck
  • 311
  • 1
  • 3
  • Thanks for the reply. I've tried your method and assigned the values where p_value < (>) alpha to an array where they = 1 (0). However, in the resulting array, my "significant" results don't not line up with my areas of great difference between ar1 and ar2 like I'd expect them to. Just to be clear, are df1 and df2 = the dimension sizes used in F? – Cebbie Dec 03 '17 at 03:13