Can I force xarray.open_dataset() to do a lazy load?

Question

I have a ~1 GB netcdf file on disk. I think that xarray.open_dataset() should do a lazy load so that I can see the file metadata without reading the whole file into memory. But, it takes a really long time (several minutes) to execute the following lines in Jupyter Lab. Also, memory usage goes up by ~1.5 GB.

import xarray as xr
import matplotlib.pyplot as plt

file = r'..\data\external\SMODE_PFC_Wavegliders_WHOI43.nc'
# I don't know why, but this seems to actually load the data set, instead of lazy loading
ds = xr.open_dataset(file)

I tried passing the option cache=False, but the behavior is the same.

Am I missing something? Is this a bug?

(I do receive a warning that seems irrelevant:)

SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)

Try adding `chunks={}` to your `open_dataset` call. This way you should be using `dask` to load the data lazily, which doesn't load any data until you request it — Val, Feb 17 '22 at 08:27
@Val Thanks for the suggestion to try ```chunks={}```. Unfortunately it doesn't change the behavior. — Tom F, Feb 17 '22 at 12:51

score 0 · Answer 1 · answered Feb 17 '22 at 12:55

0

A colleague of mine gave me the answer: I needed to pass the option decode_times=False. With this option, xarray.open_dataset() does lazy loading. The error I was receiving about 'unable to decode times' was the clue.

answered Feb 17 '22 at 12:55

Tom F

111
1
8

Can I force xarray.open_dataset() to do a lazy load?

1 Answers1