2

I want to extract a 12º x 12º region from lat/long/conductivity grids and calculate the mean conductivity values in this region. I can successfully apply masks on the lat/long grids, but somehow the same process is not working for the conductivity grid.

I've tried masking with for loops and now I'm using numpy.ma.masked_where function. I can successfully plot masked results (i.e: I can see that the region is extracted when I plot global maps), but the calculated mean conductivity values are corresponding to non-masked data.

I did a simple example of what I want to do:

x = np.linspace(1, 10, 10)
y = np.linspace(1, 10, 10)

xm = np.median(x)
ym = np.median(y)

x = ma.masked_outside(x, xm-3, xm+3)
y = ma.masked_outside(x, ym-3, ym+3)
x = np.ma.filled(x.astype(float), np.nan)
y = np.ma.filled(y.astype(float), np.nan)

x, y = np.meshgrid(x, y)

z = 2*x + 3*y

z = np.ma.masked_where(np.ma.getmask(x), z)

plt.pcolor(x, y, z)
plt.colorbar()

print('Maximum z:', np.nanmax(z))
print('Minimum z:', np.nanmin(z))
print('Mean z:', np.nanmean(z))

My code is:

def Observatory_Cond_Plot(filename, ndcfile, obslon, obslat, obsname, date):

files = np.array(sorted(glob.glob(filename))) #sort txt files containing the 2-D conductivitiy arrays]

filenames = ['January', 'February', 'March', 'April', 'May', 'June', 
             'July', 'August', 'September', 'October', 'November', 'December'] #used for naming output plots and files

for i, fx in zip(filenames, files):

    ndcdata = Dataset(ndcfile) #load netcdf file

    lat = ndcdata.variables['latitude'][:] #import latitude data

    long = ndcdata.variables['longitude'][:] #import longitude data

    cond = np.genfromtxt(fx)

    cond, long = shiftgrid(180., cond, long, start=False) 

    #Mask lat and long arrays and fill masks with nan values

    lat = ma.masked_outside(lat, obslat-12, obslat+12)
    long = ma.masked_outside(long, obslon-12, obslon+12)
    lat = np.ma.filled(lat.astype(float), np.nan)
    long = np.ma.filled(long.astype(float), np.nan)

    longrid, latgrid = np.meshgrid(long, lat)

    cond = np.ma.masked_where(np.ma.getmask(longrid), cond)
    cond = np.ma.filled(cond.astype(float), np.nan)

    condmean = np.nanmean(cond)

    print('Mean Conductivity is:', condmean)
    print('Minimum conductivity is:', np.nanmin(cond))
    print('Maximum conductivity is:', np.nanmax(cond))

After that, the rest of the code just plots the data

My results are:

Mean Conductivity is: 3.5241649673154587 Minimum conductivity is: 0.497494528344129 Maximum conductivity is: 5.997825822915771

However, from tmy maps, it is clear that the conductivity in this region should not be lower than 3.2 S/m. Also, printing lat, long and cond grids:

long:

[[nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]]

lat:

[[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
...
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]]

cond:

[[       nan        nan        nan ...        nan        nan        nan]
[       nan        nan        nan ...        nan        nan        nan]
 [2.86749432 2.86743283 2.86746221 ... 2.87797247 2.87265508 2.87239185]
 ...
 [       nan        nan        nan ...        nan        nan        nan]
 [       nan        nan        nan ...        nan        nan        nan]
 [       nan        nan        nan ...        nan        nan        nan]]

And it seems like the mask is not working properly.

Berriel
  • 12,659
  • 4
  • 43
  • 67
rrigaud
  • 31
  • 4
  • How did you calculate `condmean`? I haven't found it in the code example. – kmuehlbauer Mar 28 '19 at 04:40
  • If you mask your array, you need to fill the masked values with nan (like you did for lat, lon). Otherwise np.nanmean will not work. – kmuehlbauer Mar 28 '19 at 05:11
  • oh sorry. I calculated condmean with np.nanmean. I will edit source code. Also, do you mean using np.ma.filled on cond grid after I mask it? I tried it now but it did not change the results – rrigaud Mar 28 '19 at 11:47

2 Answers2

1

The problem is that the call of np.ma.filled will de-mask the long variable. Also np.meshgrid doesn't preserve the masks.

You could save the masks directly after creation and also create the meshgrid from the masks. I adapted your example accordingly. What can be seen is, that all versions of numpy mean take the mask into account. I had to adapt the upper limit (changed to 2), because the mean has been equal.

x = np.linspace(1, 10, 10)
y = np.linspace(1, 10, 10)

xm = np.median(x)
ym = np.median(y)

# Note: changed limits
x = np.ma.masked_outside(x, xm-3, xm+2)
y = np.ma.masked_outside(x, ym-3, ym+2)
xmask = np.ma.getmask(x)
ymask = np.ma.getmask(y)

x, y = np.meshgrid(x, y)
xmask, ymask = np.meshgrid(xmask, ymask)

z = 2*x + 3*y


z1 = np.ma.masked_where(np.ma.getmask(x), z)
z2 = np.ma.masked_where(xmask | ymask, z)
print(z1)
print(z2)

print('Type z1, z2:', type(z1), type(z2))
print('Maximum z1, z2:', np.nanmax(z1), np.nanmax(z2))
print('Minimum z1, z2:', np.nanmin(z1), np.nanmin(z2))
print('Mean z1, z2:', np.mean(z1), np.mean(z2) )
print('nan Mean z1, z2:', np.nanmean(z1), np.nanmean(z2) )
print('masked Mean z1, z2:', z1.mean(), z2.mean())
kmuehlbauer
  • 441
  • 4
  • 7
0

Beware that any kind of simple mean calculation (summing and dividing by the total), such as np.mean will not give you the correct answer if you are averaging over the lat-lon grid, since the area changes as you move towards the poles. You need to take a weighted average, weighting by cos(lat).

As you say you have the data in netcdf format, I hope you will permit me to suggest an alternative solution from the command line using the utility climate data operators (cdo) (on ubuntu you can install with sudo apt install cdo).

to extract the region of interest:

cdo sellonlatbox,lon1,lon2,lat1,lat2 infile.nc outfile.nc

then you can work out the correct weighted mean with

cdo fldmean infile.nc outfile.nc

you can pipe the two together like this:

cdo fldmean -sellonlatbox,lon1,lon2,lat1,lat2 infile.nc outfile.nc
ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
  • Thanks for pointing this out. This would have to be taken into account. – kmuehlbauer Mar 28 '19 at 13:44
  • It's often overlooked! :-) Probably for a smooth field over a relatively small region such as this the error is relatively small, but still better to account for it... That's why I usually use cdo for these kinds of tasks before switching to python for the analysis, as it saves the bother of worrying about grid formats and you know the answer is correct. +1 for your useful answer. – ClimateUnboxed Mar 28 '19 at 14:26
  • thanks for the suggestion! I thought it would not be needed to account these effects on small regions like that, but I did not know cdo either, so your comment was really helpful :) – rrigaud Mar 28 '19 at 17:33
  • Cool, glad to help! if you found the answers helpful, perhaps you want to give kmuehlbauer and me an upvote ;-) just tick the up-triangle above the number next to our answer - you can also choose one as the current best answer using the tick. – ClimateUnboxed Mar 29 '19 at 09:53