I work with geospatial images in tif format. Thanks to the rasterio
lib I can exploit these images as numpy
arrays of dimension (nb_bands, x, y). Here I manipulate an image that contains patches of unique values that I would like to count. (they were generated with the scipy.ndimage.label
function).
My idea was to use the unique
method of numpy
to retrieve the information from these patches as follows:
# identify the clumps
with rio.open(mask) as f:
mask_raster = f.read(1)
class_, indices, count = np.unique(mask_raster, return_index=True, return_counts=True)
del mask_raster
# identify the value
with rio.open(src) as f:
src_raster = f.read(1)
src_flat = src_raster.flatten()
del src_raster
values = [src_flat[index] for index in indices]
df = pd.DataFrame({'patchId': indices, 'nb_pixel': count, 'value': values})
My problem is this:
For an image of shape 69940, 70936, (84.7 mB on my disk), np.unique
tries to allocate an array of the same dim in int64 and I get the following error:
Unable to allocate 37.0 GiB for an array with shape (69940, 70936) and data type uint64
- Is it normal that unique reformats my painting in int64?
- Is it possible to force it to use a more optimal format? (even if all my patches were 1 pixel
np.int32
would be sufficent) - Is there another solution using a function I don't know?