I am trying to merge 100+ NDBC buoy netcdf datasets, where each file has an associated latitude and longitude, into one netcdf data set. When I use cdo
or ncrcat
I get a combined dataset but it only takes the latitude and longitude coordinates from the first station NetCDF file. Also, not sure if possible but the station name (five digits) is in the attributes of each station file and that is lost as well upon combining, whereas I would hope to carry on each individual station name somehow in the combined file.
Ideally, this is what I am wanting:
- Fully merged NetCDF file.
- All lats and lons carried over corresponding to the variables for each station.
- Somehow append a new data variable to the merged dataset to read in the station name attribute valuable for each station NetCDF file.
- The time steps vary from every 10 min, 30 min, and hourly. Need these all aligned so they need to be resampled and average to every 3 hours.
Here is one of the buoy NetCDF station datasets to see how it is structured: https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc
Or, reading it through xarray produces:
<xarray.Dataset>
Dimensions: (latitude: 1, longitude: 1, time: 48)
Coordinates:
* time (time) datetime64[ns] 2021-04-01T00:50:00 ... 20...
* latitude (latitude) float32 31.4
* longitude (longitude) float32 -80.87
Data variables: (12/13)
wind_dir (time, latitude, longitude) float64 ...
wind_spd (time, latitude, longitude) float32 ...
gust (time, latitude, longitude) float32 ...
wave_height (time, latitude, longitude) float32 ...
dominant_wpd (time, latitude, longitude) timedelta64[ns] ...
average_wpd (time, latitude, longitude) timedelta64[ns] ...
... ...
air_pressure (time, latitude, longitude) float32 ...
air_temperature (time, latitude, longitude) float32 ...
sea_surface_temperature (time, latitude, longitude) float32 ...
dewpt_temperature (time, latitude, longitude) float32 ...
visibility (time, latitude, longitude) float32 ...
water_level (time, latitude, longitude) float32 ...
Attributes:
institution: NOAA National Data Buoy Center and Participators in Data As...
url: http://dods.ndbc.noaa.gov
quality: Automated QC checks with manual editing and comprehensive m...
conventions: COARDS
station: 41008
comment: GRAYS REEF - 40 NM Southeast of Savannah, GA
location: 31.400 N 80.866 W
I have tried converting to a pandas dataframe and writing to hdf5 file format but it is not easily manipulatable for much once the hdf5 is created. I also have not much experience working with hdf5 files compared to xarray and netcdf (was reusing a premade script which is why output was hdf5).
I've tried xarray.mf_dataset()
which works but resulted in a 4 GB+ file when it should be around 100 MB and I also still had the issue of not keeping station name attribute data. I would prefer for this to be done in python (having issues using cdo
and nco
in Python currently) but can also run these commands from bash without issues.
If any more info is needed, please let me know.