I have some experimental data that exists like so:
x = array([1, 1.12, 1.109, 2.1, 3, 4.104, 3.1, ...])
y = array([-9, -0.1, -9.2, -8.7, -5, -4, -8.75, ...])
z = array([10, 4, 1, 4, 5, 0, 1, ...])
If it's convenient, we can assume that the data exists as a 3D array or even a pandas DataFrame
:
df = pd.DataFrame({'x': x, 'y': y, 'z': z})
The interpretation being, for every position x[i], y[i]
, the value of some variable is z[i]
. These are not evenly sampled, so there will be some parts that are "densely sampled" (e.g. between 1 and 1.2 in x
) and others that are very sparse (e.g. between 2 and 3 in x
). Because of this, I can't just chuck these into a pcolormesh
or contourf
.
What I would like to do instead is to resample x
and y
evenly at some fixed interval and then aggregate the values of z
. For my needs, z
can be summed or averaged to get meaningful values, so this is not a problem. My naïve attempt was like this:
X = np.arange(min(x), max(x), 0.1)
Y = np.arange(min(y), max(y), 0.1)
x_g, y_g = np.meshgrid(X, Y)
nx, ny = x_g.shape
z_g = np.full(x_g.shape, np.nan)
for ix in range(nx - 1):
for jx in range(ny - 1):
x_min = x_g[ix, jx]
x_max = x_g[ix + 1, jx + 1]
y_min = y_g[ix, jx]
y_max = y_g[ix + 1, jx + 1]
vals = df[(df.x >= x_min) & (df.x < x_max) &
(df.y >= y_min) & (df.y < y_max)].z.values
if vals.any():
z_g[ix, jx] = sum(vals)
This works and I get the output I desire, with plt.contourf(x_g, y_g, z_g)
but it is SLOW! I have ~20k samples, which I then subsample into ~800 samples in x and ~500 in y, meaning the for loop is 400k long.
Is there any way to vectorize/optimize this? Even better if there is some function that already does this!
(Also tagging this as MATLAB because the syntax between numpy/MATLAB are very similar and I have access to both software.)