3

I have 1 NetCDF file for the month of September 2007. It contains 6 hourly data for certain lat/long with wind and humidity variables. Each variable is in a shape of (120, 45, 93): 120 times (4 times a day), 45 latitudes and 93 longitudes. With the following code, I am able to get daily average data for all variables. Now, each variable is of shape (30, 45, 93). Time is an integer and has a unit of 'hours since 1900-01-01 00:00:00.0'.

From this daily averaged data, how can I split into 30 different NetCDF files for each day, with the file name containing YYYY:MM:DD time format?

import xarray as xr
monthly_data = xr.open_dataset('interim_2007-09-01to2007-09-31.nc') 
daily_data = monthly_data.resample(time='1D').mean()
iury simoes-sousa
  • 1,440
  • 3
  • 20
  • 37
Pramod
  • 89
  • 2
  • 7

3 Answers3

6

Xarray has a top level function for times like this - xarray.save_mfdataset. In your case, you would want to use groupby to break your dataset into logical chunks and then create a list of corresponding file names. From there, just let save_mfdataset do the rest.

dates, datasets = zip(*ds.resample(time='1D').mean('time').groupby('time'))
filenames = [pd.to_datetime(date).strftime('%Y.%m.%d') + '.nc' for date in dates]
xr.save_mfdataset(datasets, filenames)
jhamman
  • 5,867
  • 19
  • 39
  • In this case, is it that the list of grouped datasets created by `zip` should be small enough to be able to handle in the memory in the first place? – Light_B Jan 27 '19 at 12:49
  • 2
    No, the groupby operation will return views or lazy slices of the underlying data. This approach will also work better when using dask as, depending on the scheduler you are using, the save_mfdataset step can be executed in parallel. – jhamman Jan 27 '19 at 17:14
  • 1
    one more clarification would be that where are we breaking the dataset into chunks? Is the `groupby` operation doing it automatically? Often I have to use `sel` method in loop and then save multiple datasets and I'm wondering if the `sel` method would also break the datasets in chunks automatically? Otherwise, it would be the same as using `to_netcdf`. Thanks! – Light_B Jan 28 '19 at 15:55
  • 1
    Yes, the `groupby` method is breaking the dataset into groups. In your case, you wanted one group for each timestep so I've just used `'time'`. If I wanted groups by year, I could have used `time.year`. More info on the datetime options here: http://xarray.pydata.org/en/stable/time-series.html#datetime-components – jhamman Jan 28 '19 at 19:04
  • 1
    thanks, at first I didn't understand the `zip` method clearly and I was using `sel` to slice yearly datasets. Now, I can see how powerful `groupby` could be when applied in combination with `save_mfdataset`. Brilliant approach! – Light_B Jan 30 '19 at 17:09
2

After going through the documentation, you can use NetCDF4's num2date to convert an integer to a date. Also you can index xarray.dataset using isel():

from netCDF4 import num2date
for i in range(30):
    day = daily_data.isel(time=i)
    the_date = num2date(day.time.data, units='hours since 1900-01-01 00:00:00')
    day.to_netcdf(str(the_date.date())+'.nc', format='NETCDF4')
sam46
  • 1,273
  • 9
  • 12
  • Thanks @BanishedBot it really helped a lot. But xarray automatically read the dates so it did not require the conversion. – Pramod Jan 26 '19 at 20:04
  • while this is useful info (and I upvoted it for this) I don't understand why it is the accepted "best" answer when it doesn't actually address the question asked. The answer from jhamman does this. – ClimateUnboxed Oct 31 '19 at 09:07
  • I would use `day = daily_data.isel(time=[i])` if you want to open it later with `open_mfdataset`. – BorjaEst Mar 23 '22 at 09:23
1

Just in case it helps anyone, it is also possible to perform this task of calculating the daily mean and dividing into separate daily files directly from the command line:

cdo splitday -daymean in.nc day

which produces a series of files day01.nc day02.nc ...

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86