0

I am trying to merge 100+ NDBC buoy netcdf datasets, where each file has an associated latitude and longitude, into one netcdf data set. When I use cdo or ncrcat I get a combined dataset but it only takes the latitude and longitude coordinates from the first station NetCDF file. Also, not sure if possible but the station name (five digits) is in the attributes of each station file and that is lost as well upon combining, whereas I would hope to carry on each individual station name somehow in the combined file.

Ideally, this is what I am wanting:

  • Fully merged NetCDF file.
  • All lats and lons carried over corresponding to the variables for each station.
  • Somehow append a new data variable to the merged dataset to read in the station name attribute valuable for each station NetCDF file.
  • The time steps vary from every 10 min, 30 min, and hourly. Need these all aligned so they need to be resampled and average to every 3 hours.

Here is one of the buoy NetCDF station datasets to see how it is structured: https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc

Or, reading it through xarray produces:

    <xarray.Dataset>
Dimensions:                  (latitude: 1, longitude: 1, time: 48)
Coordinates:
  * time                     (time) datetime64[ns] 2021-04-01T00:50:00 ... 20...
  * latitude                 (latitude) float32 31.4
  * longitude                (longitude) float32 -80.87
Data variables: (12/13)
    wind_dir                 (time, latitude, longitude) float64 ...
    wind_spd                 (time, latitude, longitude) float32 ...
    gust                     (time, latitude, longitude) float32 ...
    wave_height              (time, latitude, longitude) float32 ...
    dominant_wpd             (time, latitude, longitude) timedelta64[ns] ...
    average_wpd              (time, latitude, longitude) timedelta64[ns] ...
    ...                       ...
    air_pressure             (time, latitude, longitude) float32 ...
    air_temperature          (time, latitude, longitude) float32 ...
    sea_surface_temperature  (time, latitude, longitude) float32 ...
    dewpt_temperature        (time, latitude, longitude) float32 ...
    visibility               (time, latitude, longitude) float32 ...
    water_level              (time, latitude, longitude) float32 ...
Attributes:
    institution:  NOAA National Data Buoy Center and Participators in Data As...
    url:          http://dods.ndbc.noaa.gov
    quality:      Automated QC checks with manual editing and comprehensive m...
    conventions:  COARDS
    station:      41008
    comment:      GRAYS REEF - 40 NM Southeast of Savannah, GA
    location:     31.400 N 80.866 W 

I have tried converting to a pandas dataframe and writing to hdf5 file format but it is not easily manipulatable for much once the hdf5 is created. I also have not much experience working with hdf5 files compared to xarray and netcdf (was reusing a premade script which is why output was hdf5).

I've tried xarray.mf_dataset() which works but resulted in a 4 GB+ file when it should be around 100 MB and I also still had the issue of not keeping station name attribute data. I would prefer for this to be done in python (having issues using cdo and nco in Python currently) but can also run these commands from bash without issues.

If any more info is needed, please let me know.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
Jake
  • 57
  • 5

1 Answers1

2

I suggest you try ncecat with group aggregation (gag), e.g.,

ncecat -7 --gag in*.nc out.nc

Followup to comment below:

As the referenced documentation says, this command places each input file in its entirety into its own group in the output file. You might think it "removed all of my data variables and values" if you did not examine the contents of the groups in the output, and just focused on the root level group (which contains only global metadata and subgroups). Use, e.g.,

ncks -m out.nc | more

to examine the subgroups:

zender@sastrugi:~/nco/data$ ncecat -O --gag 85.nc 86.nc 87.nc ~/foo.nc
zender@sastrugi:~/nco/data$ ncks -m -v lat ~/foo.nc | more
netcdf foo {
  group: \85 {
    dimensions:
      lat = 2 ;
      vrt_nbr = 2 ;

    variables:
      float lat(lat) ;
        lat:long_name = "Latitude (typically midpoints)" ;
        lat:units = "degrees_north" ;
        lat:bounds = "lat_bnd" ;

      float lat_bnd(lat,vrt_nbr) ;
        lat_bnd:purpose = "Cell boundaries for lat coordinate" ;
  } // group /85
  group: \86 {
    dimensions:
      lat = 2 ;
      vrt_nbr = 2 ;

    variables:
      float lat(lat) ;
        lat:long_name = "Latitude (typically midpoints)" ;
        lat:units = "degrees_north" ;
        lat:bounds = "lat_bnd" ;

      float lat_bnd(lat,vrt_nbr) ;
        lat_bnd:purpose = "Cell boundaries for lat coordinate" ;
  } // group /86
  group: \87 {
    dimensions:
      lat = 2 ;
      vrt_nbr = 2 ;

    variables:
      float lat(lat) ;
        lat:long_name = "Latitude (typically midpoints)" ;
        lat:units = "degrees_north" ;
        lat:bounds = "lat_bnd" ;

      float lat_bnd(lat,vrt_nbr) ;
        lat_bnd:purpose = "Cell boundaries for lat coordinate" ;
  } // group /87
} // group /
Charlie Zender
  • 5,929
  • 14
  • 19
  • I performed this with my files and it removed all of my data variables and values. It only appends the file names to the history and nothing else. – Jake Jul 08 '21 at 17:19
  • Thanks, I'm having to pursue another route as I cannot get cdo or nco to work with python still. This probably works for me but I cannot test it in python currently. – Jake Jul 13 '21 at 21:37