4

I am applying slicing and aggregation operations over Netcdf files in Python language. One of the solutions for working with this kind of file is to use the Xarray library.

I am still new to the library functionalities, so I would like to know whether Xarray objects possess some method to check if a sliced DataSet/DataArray is empty or not, just like Pandas has (in the case of pandas, one can check if the dataframe/series is empty through the 'empty' method).

The only solution I found was to always convert the Xarray Dataset/DataArray into a pandas Dataframe/Series, to then check if it is empty or not.

Here is code snippet as example:

import xarray as xr

path = 'my_path_to_my_netcdf_file.nc'

Xarray_DataArray = xr.open_dataset(path)

print(Xarray_DataArray)

# this returns something like:

 #     Dimensions:      (lat: 600, lon: 672, time: 37)
 #     Coordinates:
 #     * lat          (lat) float32 -3.9791672 -3.9375012 ... 20.9375 20.979166
 #     * lon          (lon) float32 -60.979168 -60.9375 ... -33.0625 -33.020832
 #     * time         (time) datetime64[ns] 2010-05-19 2010-05-20 ... 2010-06-24
 #     Data variables:
 #       variable_name  (time, lat, lon) float32 dask.array<shape=(37, 600, 672), 
 #         chunksize=(37, 600, 672)>

 # I normally use the 'sel' method to slice the xarray object, like below:

Sliced_Xarray_DataArray = Xarray_DataArray.sel({'lat':slice(-10, -9),
                                                'lon':slice(-170, -169)                  
                                                })


 # but since, Xarray does not possess a proper way to check the slice, I usually have to do the following:

 if Sliced_Xarray_DataArray.to_dataframe().empty():
    print('is empty. Nothing to aggregate')

 else:
    Aggregated_value =  Aggregation_function(Sliced_Xarray_DataArray)

    print('continuing with the analysis')


 #    ... continue

I would appreciate any suggestions.

I thank you for your time, and I hope hearing from you soon.

Sincerely yours,

Philipe R. Leal

Philipe Riskalla Leal
  • 954
  • 1
  • 10
  • 28
  • can't you use this http://xarray.pydata.org/en/stable/generated/xarray.Dataset.sizes.html?highlight=sizes ? – vb_rises Sep 13 '19 at 15:26

2 Answers2

2

You can check if the size of the variables in the resulting slice is 0 easily enough:

print(Sliced_Xarray_DataArray.time.size)
if Sliced_Xarray_DataArray.time.size == 0:
    print('is empty. Nothing to aggregate')
else:
    print('not empty. Go aggregate')

Any of your coordinate variables as well us other variables will be accessible as attributes in your Sliced_Xarray_DataArray so in your example you could check the size of lat, lon or time.

chris
  • 1,267
  • 7
  • 20
  • I see. Thank's Chris. Regarding coordinate variable, the only problem with using them for checking data is that I would need to verify one at a time in order to know if I had data to aggregate, and that would slow my algorithm. Furthermore, It would make it worse for reading and reproducibility. The first solution you suggested is clean and simple. I will adopt that. If you wish, see my algorithm in GitHub. It does match-up operations over geometries in Time-Space dimension. Feel free to use. Here is the link: "https://github.com/PhilipeRLeal/time_space_aggregations" – Philipe Riskalla Leal Sep 16 '19 at 12:24
0

I think what you have is an xarray Dataset. If you want to check that one of its DataArrays is empty, you can access it and then check its size. In your case this would look like:

dataset = xr.open_dataset(path)

if dataset['variable_name'].size == 0:
    # empty!
else:
    # not empty

simlmx
  • 999
  • 12
  • 17