A series of about 90 netCDF files each around 27 MB each, opened with xarray's open_mfdataset takes a long time to load a small space-time selection.
Chunking dimensions yield marginal gain. decode_cf=True either inside the function or separate has no difference either. Another suggestion here https://groups.google.com/forum/#!topic/xarray/11lDGSeza78 had me save the selection as a separate netCdf and reload it.
It seems to bottleneck when the dask portion has to do some work (loading, computing, converting to a pandas dataframe).
Generating a graph with dask.visualize generates a huge image. It may be telling us something, but I'm not sure how to interpret.
wind = xr.open_mfdataset(testCCMPPath,\
decode_cf=True,\
chunks={'time': 100,\
'latitude': 100,\
'longitude': 100})
%timeit wind.sel(latitude=latRange, longitude=windLonRange, time=windDateQueryRange).load()
wxr = wind.sel(latitude=latRange, longitude=windLonRange, time=windDateQueryRange)
df = wxr.to_dataframe()
print(df.shape)
timeit output shows
1.93 s ± 29.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
df.shape output is only 164x3.
I have a similar sel for another xr array and am getting times of about .05 seconds, however this has a lot of sparse points. The wind xr array has few empty spaces.