1

I'm trying to set all values in a cube more than a certain number to zero.

I've tried the following noddy way:

cube_tot_lo.data = np.where(cube_tot_lo.data < 1.0e19, cube_tot_lo.data, 0.0)

but it is a large cube and kills the memory. I was wondering if there is a nicer way to do this?

Thanks all for your time!

Ivan Kolesnikov
  • 1,787
  • 1
  • 29
  • 45
Bosley
  • 11
  • 2

1 Answers1

2

(1) A more usual numpy idiom would be:

cube.data[cube.data < threshold_value] = 0.0

I think that should make some impression on the memory problem, as it doesn't compute an entire new floating-point array to assign back.
However, it does need to create a data-sized boolean array for cube.data < threshold_value, so it might still not solve your problem.

(2) A really simple performance improvement could be to do this in sections, if you have a dimension you can slice over, such as a typical Z dimension with a few 10's of levels? Then you can just divide the task, e.g., for a 4d cube of dims t,z,y,x :--

for i in range(nz):
    part = cube.data[:, iz]
    part[part < threshold_value] = 0.0

That should also work well if your cube already contains "real" rather than "lazy" data .

(3) However, I wonder if your key problem could be that fetching all the data at once is itself simply too big ?
That is perfectly possible in Iris, as it uses deferred loading : so, any reference to "cube.data" will fetch all the data into a real in-memory array, whereas e.g. simply saving the cube or calculating a statistic would be able to avoid that.
So, the usability of really big cubes critically depends on what you eventually do with the content.

Iris now has a much fuller account of this, in the docs for the forthcoming version 2.0 : https://scitools-docs.github.io/iris/master/userguide/real_and_lazy_data.html

For instance, with Dask in the upcoming iris v2, it will be possible to use dask to do this efficiently. Something like:

data = cube.lazy_data()
data = da.where(data < threshold_value, data, 0.0)
zapped_cube = cube.copy(data=data)

This makes a derived cube with a deferred data calculation. As that can be processed in "chunks" when its time comes, it can drastically reduce the memory usage.

pp-mo
  • 468
  • 2
  • 8