-1

I have a very large dataset in a NetCDF file.

RZSC = xr.open_dataset('/home/chandra/data/RZSC_250m_SA.nc')
RZSC = RZSC.Band1
RZSC
[Output]:
<xarray.DataArray 'Band1' (lat: 32093, lon: 20818)>
[668112074 values with dtype=float32]
Coordinates:
  * lat      (lat) float64 -58.36 -58.36 -58.35 -58.35 ... 13.71 13.71 13.71
  * lon      (lon) float64 -81.38 -81.37 -81.37 -81.37 ... -34.63 -34.63 -34.62
Attributes:
    long_name:     GDAL Band Number 1
    grid_mapping:  crs
########################
Treecover = xr.open_dataset('/home/chandra/data/Treecover_MOD44B_2000_250m_AMAZON.nc')
Treecover = Treecover.Band1
Treecover
[Output]:
<xarray.DataArray 'Band1' (lat: 32093, lon: 20818)>
[668112074 values with dtype=float64]
Coordinates:
  * lat      (lat) float64 -58.36 -58.36 -58.35 -58.35 ... 13.71 13.71 13.71
  * lon      (lon) float64 -81.38 -81.37 -81.37 -81.37 ... -34.63 -34.63 -34.62
Attributes:
    long_name:     GDAL Band Number 1
    grid_mapping:  crs
####
np.nanmax(Treecover[:,:])
[Output]: 85.0625
np.nanmin(Treecover[:,:])
[Output]: 0.0

I am neither able to visualize the dataset or filter the dataset using any command like RZSC[:,:].where(Treecover[:,:] > 1000).shape which is quite frustrating (as the output is (32093, 20818), same as the original array size).

Does anyone have any suggestion for this? I was not able to share the data as the size of the netcdf file is > 6 GB.

Ep1c1aN
  • 683
  • 9
  • 25
  • Is Treecover a variable? What error do you receive? More information would be helpful. – bwc Jul 01 '19 at 16:18
  • @bwc. Yes, `Treecover` is also a variable, different from RZSC but with the same array size. Information is now updated in the question. I am not percieving any error, it's just that `RZSC[:,:].where(Treecover[:,:] > 1000).shape` returns the same array size as the original, which i know for a fact is not true (as maximum value of `Treecover` is 100). – Ep1c1aN Jul 01 '19 at 16:26

1 Answers1

1

xr.where() will always return the same size array that you feed it. Did you try visualizing it? It should set all of the indices where the condition is false to NA. You can manually set it to whatever you want as well:

RZSC.where(Treecover > 1000, Treecover, np.NaN)
bwc
  • 1,028
  • 7
  • 18
  • I get an error `ValueError: cannot set 'other' if drop=True` for the above code. I don't know what that means. Also the dataset is too large to just `.plot()`. – Ep1c1aN Jul 01 '19 at 16:34
  • Please take a look at the [documentation](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.where.html), it should answer many of your questions. Did you try setting drop to 'False'? You can try keeping `drop = True` but just run `RZSC.where(Treecover > 1000)` – bwc Jul 01 '19 at 16:40
  • Thanks @bwc. The documentation was quite helpful. – Ep1c1aN Jul 02 '19 at 07:06
  • can you answer this one? It seems that even after filtering I am getting a lot of sample point (in millions), so I am thinking the representative sample approach for the above question. https://stackoverflow.com/questions/56846585/extract-a-percent-of-sample-from-large-data-set-for-analysis – Ep1c1aN Jul 02 '19 at 12:39