0

I wrote a class to rescale images, but the RGB value became ranging from 0 to 1 after preocessing. What happened to the RGB which intuitively should be ranging from 0-255 ? Following are the Rescale class and the RGB values after rescaling.

Question:

Do I still need a Min-Max Normalization, map the RGB value to 0-1?

How do I apply transforms.Normalization, where do I put the Normalization, before or after the Rescale, how do I calculate the mean and variance, use the RGB value ranging from 0-255 or 0-1?

Thanks for your time!

class Rescale(object):
    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        self.output_size = output_size

    def __call__(self, sample):
        image, anno = sample['image'], sample['anno']

        # get orginal width and height of image
        h, w = image.shape[0:2]
        # if output_size is an integer

        if isinstance(self.output_size, int):
            if h > w:
                new_h, new_w = h * self.output_size / w, self.output_size
            else:
                new_h, new_w = self.output_size / h, w * self.output_size / h

        # if output size is a tuple (a, b)
        else:
            new_h, new_w = self.output_size
        new_h, new_w = int(new_h), int(new_w)

        image = transform.resize(image, (new_h, new_w))       
        return {'image': image, 'anno': anno}
[[[0.67264216 0.50980392 0.34503034]
  [0.67243905 0.51208121 0.34528431]
  [0.66719145 0.51817184 0.3459951 ]
  ...
  [0.23645098 0.2654311  0.3759458 ]
  [0.24476471 0.28003857 0.38963938]
  [0.24885877 0.28807445 0.40935877]]

 [[0.67465196 0.50994608 0.3452402 ]
  [0.68067157 0.52031373 0.3531848 ]
  [0.67603922 0.52732436 0.35839216]
  ...
  [0.23458333 0.25195098 0.36822142]
  [0.2461343  0.26886127 0.38314558]
  [0.2454384  0.27233056 0.39977664]]

 [[0.67707843 0.51237255 0.34766667]
  [0.68235294 0.5219951  0.35553024]
  [0.67772059 0.52747687 0.35659176]
  ...
  [0.24485294 0.24514568 0.36592999]
  [0.25407436 0.26205475 0.38063318]
  [0.2597007  0.27202914 0.40214216]]

 ...
[[[172 130  88]
  [172 130  88]
  [172 130  88]
  ...
  [ 63  74 102]
  [ 65  76 106]
  [ 67  77 112]]

 [[173 131  89]
  [173 131  89]
  [173 131  89]
  ...
  [ 65  74 103]
  [ 64  75 105]
  [ 63  73 108]]

 [[173 131  89]
  [174 132  90]
  [174 132  90]
  ...
  [ 63  72 101]
  [ 62  71 102]
  [ 61  69 105]]
  ...
Wei Wong
  • 65
  • 1
  • 7

1 Answers1

1

You can use torchvision to accomplish this.

transform = transforms.Compose([
    transforms.Resize(output_size),
    transforms.ToTensor(),
])

This requires a PIL image as input. It will return the tensor in [0, 1] range.You may also add mean-standard normalization as below

transform = transforms.Compose([
    transforms.Resize(output_size),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

Here mean and std are per channel mean and standard deviation of all pixels of all images in the training set. You need to calculate them after resizing all images and converting to torch Tensor. One way to do this would be to apply first two transformation (resize and ToTensor) and then calculate mean and std over all training images like this

x = torch.concatenate([train_data[i] for i in range(len(train_data))])
mean = torch.mean(x, dim=(0, 1))
std = torch.std(x, dim=(0, 1))

Then you use this mean and std value with Normalize transorm above.

xashru
  • 3,400
  • 2
  • 17
  • 30
  • Hi, thank you so much. How am I supposed to calculate the mean and std? Use the image before resizing or after resizing? If I calculate the mean and std before resizing, the Normalize comes after the Resize, which doesn't make any sense as the number of pixels changes after Resize, it seems wrong to Normalize with the calculated value of orginial image. But If want use the mean and std calculated by after Resize, I need to write an function to calculated mean and std between Resize and Normalize? – Wei Wong Apr 25 '20 at 02:42
  • Thank you for your time. If I want to use the model to predict one single image, I have to do Normalize again for one single image, right? – Wei Wong Apr 25 '20 at 09:49
  • Yes, using same values used during training. – xashru Apr 25 '20 at 11:06