Split netcdf4 file based on latitude values in python

Question

I have a netCDF file with three variables:

input.variables

{'longitude': <class 'netCDF4._netCDF4.Variable'>
 float32 longitude(longitude)

'latitude': <class 'netCDF4._netCDF4.Variable'>
 float32 latitude(latitude)

'Values': <class 'netCDF4._netCDF4.Variable'>
 int16 Values(time, latitude, longitude)

{'time':
 Length: 504, dtype: datetime64[ns],
}

I want to split this netcdf file in the northern and southern hemisphere. I would guess this is a simple boolean indexing issue, but this is simply not possible. I have tried it as the following:

south_mask = input[input.variables['latitude'][:] < 0]
north_mask = input[input.variables['latitude'][:] >= 0]

But this doesn't work:

TypeError: expected str, bytes or os.PathLike object, not MaskedArray

I need the full netcdf file as well in this particular project, so I want to do this within python not externally using the command line for instance.

Currently I am doing it like this:

middle_index = input.variables['longitude'][:].tolist().index(0.)

# create empty df
N_df = pd.DataFrame(columns = ['Time', 'Value'])
S_df = pd.DataFrame(columns = ['Time', 'Value'])

# Index based on values
for i in range(0, 504):
    # Get time
    time = input['time'][i]

    # Then loop and index from beginning to middle index, and middle index to end.
    N = np.average(input['Value'][i, :middle_index, :]) 
    S = np.average(input['Value'][i, middle_index:, :]) 

    # then making two dataframes and appending to two
    N_df_tmp = pd.DataFrame({'Time' : [time], 
                                     'Value': [Value]})
    S_df_tmp = pd.DataFrame({'Time' : [time], 
                                     'Value': [Value]}) 
    # Resulting in two dataframes.
    S_df = S_df.append(S_df_tmp)
    N_df = N_df.append(N_df_tmp)

But I feel like there must be an easier method to slice the datasets in two, even without a loop.

score 0 · Answer 1 · answered Feb 09 '22 at 22:13

You should be looking at xarray.

import xarray as xr
ds = xr.open_dataset("globe.nc")
northern = ds.sel(latitude=slice(0, 90))
southern = ds.sel(latitude=slice(-90, 0))

# if latitude in decreasing order:
# northern = ds.sel(latitude=slice(90, 0))

If you need to write files, look at the to_netcdf() method. But then command line tools would be better for this.

Split netcdf4 file based on latitude values in python

1 Answers1