I am trying to regrid/interpolate within a grid of a certain size, my dataset of irregularly scattered location (lat lon) tied variable values. My data is available as a dataframe with columns marking the value of variable, latitude and longitude, separately.
I have to first grid this data, by optimizing grid size, and then find the best method to take average of different number of points lying within the grid box.
I have tried a code by following an online example. I use histogram2d function to grid the latitudes and longitudes. I fill the grid boxes having scatter points, with density count (equal to average of all points lying within the grid). (I will then have to use this newly gridded data, generated out of scatter points, to compare with another dataset that has a different grid resolution).
It should ideally work fine but grid boxes without scatter points are getting filled while those with the points are being left out. The mismatch is greater in finer resolution or smaller bin sizes.
I have looked up these examples - example 1, example 2.
Here is a part of my code:
df #Dataframe as a csv file opened in pandas
y = df['lon']
x = df['lat']
z = df['var']
# Bin the data onto a 10x10 grid or into any other size
# Have to reverse x & y due to row-first indexing
zi, yi, xi = np.histogram2d(y, x, bins=(5,5), weights=z, normed=False)
counts, _, _ = np.histogram2d(y, x, bins=(5,5))
zi = zi / counts
zi = np.ma.masked_invalid(zi)
m = Basemap(llcrnrlat=45,urcrnrlat=55,llcrnrlon=25,urcrnrlon=30)
m.drawcoastlines(linewidth =0.75, color ="black")
m.drawcountries(linewidth =0.75, color ="black")
m.drawmapboundary()
p,q = m(yi,xi)
#cs=m.pcolormesh(xi, yi, zi, edgecolors='black',cmap = 'jet')
cs=m.pcolormesh(p, q, zi, edgecolors='black',cmap = 'jet')
m.colorbar(cs)
#scat = m.scatter(x,y, c=z, s=200,edgecolors='red')
scat=m.scatter(y,x, latlon=True,c=z, s =80)
The following is the image getting generated.
Any help will be much appreciated.