0

I have a ~1 GB netcdf file on disk. I think that xarray.open_dataset() should do a lazy load so that I can see the file metadata without reading the whole file into memory. But, it takes a really long time (several minutes) to execute the following lines in Jupyter Lab. Also, memory usage goes up by ~1.5 GB.

import xarray as xr
import matplotlib.pyplot as plt

file = r'..\data\external\SMODE_PFC_Wavegliders_WHOI43.nc'
# I don't know why, but this seems to actually load the data set, instead of lazy loading
ds = xr.open_dataset(file)

I tried passing the option cache=False, but the behavior is the same.

Am I missing something? Is this a bug?

(I do receive a warning that seems irrelevant:)

SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
Tom F
  • 111
  • 1
  • 8
  • Try adding `chunks={}` to your `open_dataset` call. This way you should be using `dask` to load the data lazily, which doesn't load any data until you request it – Val Feb 17 '22 at 08:27
  • @Val Thanks for the suggestion to try ```chunks={}```. Unfortunately it doesn't change the behavior. – Tom F Feb 17 '22 at 12:51
  • I think you need `parallel=True` maybe? – ouranos May 24 '23 at 05:56

1 Answers1

0

A colleague of mine gave me the answer: I needed to pass the option decode_times=False. With this option, xarray.open_dataset() does lazy loading. The error I was receiving about 'unable to decode times' was the clue.

Tom F
  • 111
  • 1
  • 8