2

I'm working with GANs on the Single Image Super-Resolution (SISR) problem at 4x scaling. I am using residual learning techniques, so what I get back from the trained network is a tensor containing the estimated residual image between the upscaled input image and the target image. I feed the network with normalized numpy arrays representing the images (np.asarray(image) / 255).

In order to get the final estimate image, then, I have to sum the upscaled input image with the residual image. Here is the code I use (input image's size is 64x64 while the output has size 256x256):

net.eval()
img = Image.open(image_folder + 'lr/' + image_name)
tens = transforms.ToTensor()
toimg = transforms.ToPILImage()

input = tens(img)
bicub_res = tens(img.resize((img.size[0] * 4, img.size[1] * 4), Image.BICUBIC))

input = input.view((1, 3, 64, 64))
output = net(input)
output = torch.add(bicub_res, output).clamp(0, 255)
output = output.view((3, 256, 256))
output = toimg(output)

Now, having these images as low resolution, high resolution and residuals (network output):

Bicubic upsampling of the input High resolution image Residual image

if I sum the low resolution image with the residual image as shown in the code, what I get is:

result not stretched

that seems a bit too dark. Now, given that the data structure are numpy arrays, I've tried to stretch back the values of the array to the range (0, 255) and then convert it back to an image. In this case, I get this:

result stretched

which is a bit brighter than before, but still very dark. What am I doing wrong? How can I get my image back?

EDIT: I will answer my question: the problem was a constant factor per each layer that I forgot to add.

Nonetheless, I have another question to ask: after recovering the right images, I noticed some kind of noise on each image:

noise_bird noise_baby

and looking at other images, like the baby, I noticed that it is a repetition of 9 times (on a 3x3 grid) of some kind of "watermark self image". This pattern is the same for every picture, no matter what I do or how I train the network. Why do I see this artifacts?

F. Malato
  • 192
  • 2
  • 15

1 Answers1

0

So, I solved both my questions. For future reference:

  1. The first question was a mistake in my code: when I train the network, I subtract a constant value per channel PER_CHANNEL_MEANS = np.array([0.47614917, 0.45001204, 0.40904046]). When it came to get the image back, I didn't add that value back and since the value were fixed per each channel, it resulted in a brightness shifting.

  2. My second question was even harder, because the problem wasn't my code or my network, but numpy: apparently, reshaping an array from (3, 256, 256) to (256, 256, 3)` changes the data distribution, hence the shifting. To solve, I used:

    output = torch.add(output, torch.from_numpy(PER_CHANNEL_MEANS).view((1, 3, 256, 256))).clamp(0, 255)
    o = output.view((3, 256, 256))
    o = o.data.numpy()
    o = np.swapaxes(o, 0, 1)
    o = np.swapaxes(o, 1, 2)
    

    it's not an elegant way, but it does the job.

ADDENDUM: At this point, I had solved my two problems but I had another one, that can be noticed very easily in the last image of my post: some pixel shifted to completely wrong colors.

To turn an array a into an image, I used a.astype(np.uint8), without being aware that if a value v exceeds np.uint8's maximum value (255), the resulting value will be np.mod(v, 255). This caused the color shifting, which I solved following the answer to that question.

Please feel free to suggest a more elegant way for my solution to the second problem, I will provide to edit it.

F. Malato
  • 192
  • 2
  • 15