0

I have seen many cases in which one subtracts the mean of the dataset, aiming to "centralize" the mean in 0 (e.g., this, this and this).

In my case, I have a dataset composed by 256x256 images, where each position is a byte (and can assume a value between 0 and 255). I calculated the mean of each pixel across the dataset, but when I finally wrote the code to subtract the mean, I realized doing so would cause an overflow.

For concreteness, say my dataset is composed by only two "images" a and b as follows:

In [1]: a = np.array([[1, 1], [1, 1]], dtype = np.uint8)
In [2]: b = np.array([[9, 9], [9, 9]], dtype = np.uint8)
In [3]: dataset = np.array([a, b])
Out[18]: 
array([[[1, 1],
        [1, 1]],

       [[9, 9],
        [9, 9]]], dtype = np.uint8)

Now my mean will be:

In [20]: mean = dataset.mean(axis=0)
Out[19]: 
array([[ 5.,  5.],
       [ 5.,  5.]])

And if I try subtracting the mean I get:

In [28]: a - mean
Out[28]: 
array([[-4., -4.],  # Notice, here, that the `uint8` type was apparently
       [-4., -4.]]) # casted, and these values "overflew" into negative values

I thought of "capping" in 0, but I noticed that this would happen in roughly half of my dataset, and it didn't look like a great solution. Any suggestion? (or is this just something that no one cares about anyway?)

Community
  • 1
  • 1
vaulttech
  • 493
  • 1
  • 5
  • 15
  • Change your data to a representation that handles negatives? Or don't standardize your data. Is there any reason you're doing that besides that it seems to be best practice? – user2699 Jan 27 '17 at 17:31
  • Why would you like the image to be stored in `uint` format? – jdhao Dec 05 '17 at 08:14

0 Answers0