4

I'm using xarray.interp on a large 3D DataArray (weather data: lat, lon, time) to map the values (wind speed) to new values based on a discrete mapping function f. The interpolation method seems to only utilise one core for computation, making the process horibly inefficient. I can not figure out how to make xarray to use more than one core for this task.

I did monitor the computation via htop and a dask dashboard for xarray.interp. htop only shows one core to be in use, the dashboard doesn't show any activity in any of the workers. The only dask activity I can observe is from loading the netcdf data file from disk. If I preload the data using .load(), this dask activity is gone.

I also tried using using a scipy.interpolate.interp1d function with xarray.apply_ufunc() to achieve the equivalent result I am aiming for but did not observe any parallel utilisation (htop) or activity (dask dashboard) either.

The fastest approach for me right now is using numpy.interp and then recasting it back to a xr.DataArray with the coordinates of the original DataArray. But that's also not parallelised and only some percent faster.

In the following MWE I don't see any dask activity after the da.load() statement in block 4.

edit:
The code has to be run in the separate blocks 1 - 4 when evaluting using e.g. htop. Because load() is causing multi-core activity and happens either explicitly (block 2) or implicitly (triggered by 4), it's easy to missattribute the multi-core activity to .interp() when its caused by data loading if you run the script as a whole.

# 1: For the dask dashboard
from dask.distributed import Client
client = Client()
display(client)

import xarray as xr
import numpy as np

da = xr.tutorial.open_dataset("air_temperature", chunks={})['air']
# 2: Preload data into memory
da.load()
# 3: Dummy interpolation function
xp = np.linspace(0,400,21)
fp = -1*(xp-300)**2
xr_interp_da = xr.DataArray(fp, [('xp', xp)], name='interpolation function')
# 4: I expect this to run in parallel but it does not
f = xr_interp_da.interp({'xp':da})
euronion
  • 1,142
  • 6
  • 14
  • I ran that piece of code and I got activity in all the cores of my machine, with and without the `da.load()` statement. – JulianGiles Apr 10 '19 at 19:48
  • Did you run the statements between each comment separatley or always as a whole @JulianGiles ? I didn't explicitly mention this (updated the question accordingly). If the script is run as a whole, the `.load()` also happens implicitly and also causes activity on my cores. I don't see activity on more than one core in the last code block (no. 4), where I would expect more cores to participate. – euronion Apr 15 '19 at 12:51
  • I ran the code in separate blocks and also as a whole without the `.load()` statement and I got multi core activity. It's not continuous 100% activity in all cores but I definitively see several cores working at the same time while watching `htop`. – JulianGiles Apr 16 '19 at 15:06

0 Answers0