0

I am searching for an option to select data from a NetCDF file at a specific variable value. The dataset contains time, lat, and lon coordinates and a range of variables. One of these variables is a mask with specific values for Land/ open ocean/ sea-ice /lake. Since the open ocean is represented by ds.mask = 1, I want to extract only sea surface temperature values which are located at the coordinates (time and space) where mask = 1. However, I do not want the sea surface temperature values at other coordinates to be set to NaN, but to keep only those coordinates and variable's values where ds.mask = 1. I know how to select and data with xarray.sel/isel, however, this works only with selecting by coordinates, not by variable values as I am trying it. Any help would be very much appreciated.

lati = stormtrack_lat.values
loni = stormtrack_lon.values
timei = stormtrack_datetime.values
tmax = timei.max() + np.timedelta64(10,'D')
tmin = timei.min() - np.timedelta64(10,'D')
SSTskin_subfile = SSTskin_file.sel(time=slice(tmin, tmax))

#HERE I NEED HELP:
#extract data where mask = ocean (1) and use only these data points and keep these only!
SSTskin_subfile_masked = SSTskin_subfile.sel(SSTskin_subfile.mask == 1) #does not work yet (Thrown error: ValueError: the first argument to .isel must be a dictionary)

This is the NetCDF file's structure:

file structure

user17681970
  • 123
  • 1
  • 1
  • 8

1 Answers1

0

You can apply the ocean mask with .where :

SSTskin_subfile_masked = SSTskin_subfile.where(SSTskin_subfile.mask)

It is not possible to drop all masked points because the data are gridded. For example if you have just one defined value for a given latitude, you have to keep all the values along it. However you can drop the coordinates where all values are NaN with:

SSTskin_subfile_masked.dropna(dim = ['lat', 'lon'], how = 'all')
Thrasy
  • 536
  • 3
  • 9
  • Thank you; do I understand this right that it would drop the coordinates where mask = NaN? – user17681970 Jan 11 '22 at 13:10
  • 1
    @user17681970 Not exactly, `.where()` will set the values where mask = 1 to NaN. Then `.dropna(dim = dim, how = 'all')` will drop the coordinate where all values are NaN, e.g.: if there is a latitude where all values are NaN, it will be dropped but if NaN and valid values are mixed you cannot drop only the NaN. – Thrasy Jan 11 '22 at 13:25
  • Okay, but then the your first code line would need to be `SSTskin_subfile_masked = SSTskin_subfile.where(SSTskin_subfile.mask ==1)`, doesn't it? – user17681970 Jan 11 '22 at 13:27
  • @user17681970 Both should work as Python interprets 1 as True (ie: `1 == True` is True). – Thrasy Jan 11 '22 at 13:30
  • Okay thank you very much; I think this cannot be my way to go due to the conditions you have explained above. I will try to find another approach to my issue. – user17681970 Jan 11 '22 at 14:09
  • @user17681970 I don't know your use case but if you really want to drop all non-ocean values, you will probably have to convert your datarray to another format. For example, you can mask it, convert to a dataframe with `df = SSTskin_subfile_masked.to_dataframe()` then `df.dropna(subset = ["analysed_sst"])`. – Thrasy Jan 11 '22 at 14:28
  • Thank you very much, but I do not want to do that due to further analysis. I have another idea to tackle my issue and posted my question here: https://stackoverflow.com/questions/70668514/how-to-add-a-condition-to-legend-labelling-to-show-certain-entries-only – user17681970 Jan 11 '22 at 14:31