I have a calculation that expects a pandas dataframe as input. I'd like to run this calculation on data stored in a netCDF file that expands to 51GB - currently I've been opening the file with xarray.open_dataset
and using chunks (my understanding is that this opened file is actually a dask array, so only loads chunks of data into memory at a time). However, I don't seem to be able to take advantage of this lazy loading, since I have to convert the xarray data into a pandas dataframe in order to run my calculation - and my understanding is that at that point all the data is loaded into memory (which is bad).
So I guess long story short, my question is: how can I get from an xarray dataset to a pandas dataframe without any intermediate steps that load my entire data into memory? I've seen dask work with pandas.read_csv
, and I see it work with xarray, but I'm not sure how to convert an already opened netCDF xarray dataset to a pandas dataframe in chunks.
Thanks and sorry for the vague question!