I have seen many cases in which one subtracts the mean of the dataset, aiming to "centralize" the mean in 0 (e.g., this, this and this).
In my case, I have a dataset composed by 256x256 images, where each position is a byte (and can assume a value between 0 and 255). I calculated the mean of each pixel across the dataset, but when I finally wrote the code to subtract the mean, I realized doing so would cause an overflow.
For concreteness, say my dataset is composed by only two "images" a
and b
as follows:
In [1]: a = np.array([[1, 1], [1, 1]], dtype = np.uint8)
In [2]: b = np.array([[9, 9], [9, 9]], dtype = np.uint8)
In [3]: dataset = np.array([a, b])
Out[18]:
array([[[1, 1],
[1, 1]],
[[9, 9],
[9, 9]]], dtype = np.uint8)
Now my mean will be:
In [20]: mean = dataset.mean(axis=0)
Out[19]:
array([[ 5., 5.],
[ 5., 5.]])
And if I try subtracting the mean I get:
In [28]: a - mean
Out[28]:
array([[-4., -4.], # Notice, here, that the `uint8` type was apparently
[-4., -4.]]) # casted, and these values "overflew" into negative values
I thought of "capping" in 0, but I noticed that this would happen in roughly half of my dataset, and it didn't look like a great solution. Any suggestion? (or is this just something that no one cares about anyway?)