I am trying to merge multiple nc files containing physical oceanographic data for different depths at different latitudes and longitudes. I am using ds = xr.open_mfdataset to do this, but the files are not merging correctly and when I try to plot them it seems there is only one resulting value for the merged files. This is the code I am using:
##Combining using concat_dim and nested method
ds = xr.open_mfdataset("33HQ20150809*.nc", concat_dim=['latitude'], combine= "nested")
ds.to_netcdf('geotraces2015_combined.nc')
df = xr.open_dataset("geotraces2015_combined.nc")
##Setting up values. Oxygen values are transposed so it matches same shape as lat and pressure.
oxygen = df['oxygen'].values.transpose()
##Plotting using colourf
fig = plt.figure()
ax = fig.add_subplot(111)
plt.contourf(oxygen, cmap = 'inferno')
plt.gca().invert_yaxis()
cbar = plt.colorbar(label = 'Oxygen Concentration (umol kg-1')
You can download the nc files from here under CTD https://cchdo.ucsd.edu/cruise/33HQ20150809
This is how each file looks like:
<xarray.Dataset>
Dimensions: (pressure: 744, time: 1, latitude: 1, longitude: 1)
Coordinates:
* pressure (pressure) float64 0.0 1.0 2.0 3.0 ... 741.0 742.0 743.0
* time (time) datetime64[ns] 2015-08-12T18:13:00
* latitude (latitude) float32 60.25
* longitude (longitude) float32 -179.1
Data variables: (12/19)
pressure_QC (pressure) int16 ...
temperature (pressure) float64 ...
temperature_QC (pressure) int16 ...
salinity (pressure) float64 ...
salinity_QC (pressure) int16 ...
oxygen (pressure) float64 ...
... ...
CTDNOBS (pressure) float64 ...
CTDETIME (pressure) float64 ...
woce_date (time) int32 ...
woce_time (time) int16 ...
station |S40 ...
cast |S40 ...
Attributes:
EXPOCODE: 33HQ20150809
Conventions: COARDS/WOCE
WOCE_VERSION: 3.0
...
Another file would look like this:
<xarray.Dataset>
Dimensions: (pressure: 179, time: 1, latitude: 1, longitude: 1)
Coordinates:
* pressure (pressure) float64 0.0 1.0 2.0 3.0 ... 176.0 177.0 178.0
* time (time) datetime64[ns] 2015-08-18T19:18:00
* latitude (latitude) float32 73.99
* longitude (longitude) float32 -168.8
Data variables: (12/19)
pressure_QC (pressure) int16 ...
temperature (pressure) float64 ...
temperature_QC (pressure) int16 ...
salinity (pressure) float64 ...
salinity_QC (pressure) int16 ...
oxygen (pressure) float64 ...
... ...
CTDNOBS (pressure) float64 ...
CTDETIME (pressure) float64 ...
woce_date (time) int32 ...
woce_time (time) int16 ...
station |S40 ...
cast |S40 ...
Attributes:
EXPOCODE: 33HQ20150809
Conventions: COARDS/WOCE
WOCE_VERSION: 3.0
EDIT: This is my new approach which is still not working: I'm trying to use preprocess to set_coords, squeeze, and expand_dims following Michael's approch:
def preprocess(ds):
return ds.set_coords('station').squeeze(["latitude", "longitude", "time"]).expand_dims('station')
ds = xr.open_mfdataset('33HQ20150809*.nc', concat_dim='station', combine='nested', preprocess=preprocess)
But I'm still having the same problem...
Solution: First, I had to identify the coordinate with the unique value, in my case was 'station'. Then I used preprocess to apply the squeeze and set_coords and expand_dims functions to each file, following Michael's answers.
import pandas as pd
import numpy as np
import os
import netCDF4
import pathlib
import matplotlib.pyplot as plt
def preprocess(ds):
return ds.set_coords('station').squeeze(["latitude", "longitude", "time"]).expand_dims('station')
ds = xr.open_mfdataset('filename*.nc', preprocess=preprocess, parallel=True)
ds = ds.sortby('latitude').transpose()
ds.oxygen.plot.contourf(x="latitude", y="pressure")
plt.gca().invert_yaxis()