Given a large 3d numpy array (not too large to fit in memory) with type 'uint8', I would like to downscale this array with a given scalefactor in each dimension. You may assume the shape of the array is dividable by the scale factor.
The values of the array are in [0, 1, ... max] where max is always smaller than 6. I would like to scale it down such that each 3d block with shape "scale_factor" will return the number that occurs most in this block. When equal return the first (I don't care).
I have tried the following which works
import numpy as np
array = np.random.randint(0, 4, ((128, 128, 128)), dtype='uint8')
scale_factor = (4, 4, 4)
bincount = 3
# Reshape to free dimension of size scale_factor to apply scaledown method to
m, n, r = np.array(array.shape) // scale_factor
array = array.reshape((m, scale_factor[0], n, scale_factor[1], r, scale_factor[2]))
# Making histogram, first over last axis, then sum over other two
array = np.apply_along_axis(lambda x: np.bincount(x, minlength=bincount),
axis=5, arr=array)
array = np.apply_along_axis(lambda x: np.sum(x), axis=3, arr=array)
array = np.apply_along_axis(lambda x: np.sum(x), axis=1, arr=array).astype('uint8')
array = np.argmax(array , axis=3)
This worked, but the bincount is terribly slow. Also got np.histogram to work, but also very slow. I do think both methods I tried are not completely designed for my purpose, they offer many more features which slows down the methods.
My question is, does anyone know a faster method? I would also be happy if someone could point me to a method from a deep learning library that does this, but that is not officially the question.