Pytorch Unfold and Fold: How do I put this image tensor back together again?

Question

I am trying to filter a single channel 2D image of size 256x256 using unfold to create 16x16 blocks with an overlap of 8. This is shown below:

# I = [256, 256] image
kernel_size = 16
stride = bx/2
patches = I.unfold(1, kernel_size, 
int(stride)).unfold(0, kernel_size, int(stride)) # size = [31, 31, 16, 16]

I have started to attempt to put the image back together with fold but I’m not quite there yet. I’ve tried to use view to get the image to ‘fit’ the way it’s supposed to but I don’t see how this would preserve the original image. Perhaps I’m overthinking this.

# patches.shape = [31, 31, 16, 16]
patches = = filt_data_block.contiguous().view(-1, kernel_size*kernel_size) # [961, 256]
patches = patches.permute(1, 0) # size = [951, 256]

Any help would be greatly appreciated. Thanks very much.

Gil Pinsky · Answer 1 · 2020-09-25T17:02:58.943

I believe you will benefit from using torch.nn.functional.fold and torch.nn.functional.unfold in this case, as these functions are built specifically for images (or any 4D tensors, that is with shape B X C X H X W).

Let's start with unfolding the image:

import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt
from sklearn.datasets import load_sample_image #Used to load a sample image

dtype = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
#Load a flower image from sklearn.datasets, crop it to shape 1 X 3 X 256 X 256:
I = torch.from_numpy(load_sample_image('flower.jpg')).permute(2,0,1).unsqueeze(0).type(dtype)[...,128:128+256,256:256+256]
kernel_size = 16
stride = kernel_size//2 
I_unf = F.unfold(I, kernel_size, stride=stride)

Here we obtain all the 16x16 image patches with strides of 8 by using the F.unfold function. This will result in a 3D tensor with shape torch.Size([1, 768, 961]). ie - 961 patches with 768 = 16 X 16 X 3 pixels within each.

Now, say we wish to fold it back to I:

I_f = F.fold(I_unf,I.shape[-2:],kernel_size,stride=stride)
norm_map = F.fold(F.unfold(torch.ones(I.shape).type(dtype),kernel_size,stride=stride),I.shape[-2:],kernel_size,stride=stride)
I_f /= norm_map

We use F.fold where we tell it the original shape of I, the kernel_size we used to unfold and the stride used. After folding I_unf we will obtain a summation with overlaps. This means that the resulting image will appear saturated. As a result, we need to compute a normalization map which will normalize multiple summation of pixels due to overlaps. A way to do this efficiently is to take a ones tensor and use unfold followed by fold - to mimic the summation with overlaps. This gives us the normalization map by which we normalize I_f to recover I.

Now, we wish to plot I_f and I to prove content is preserved:

#Plot I:
plt.imshow(I[0,...].permute(1,2,0).cpu()/255)

#Plot I_f:
plt.imshow(I_f[0,...].permute(1,2,0).cpu()/255)

This whole process will work also for single-channel images. One thing to notice is that if spatial dimensions of the image are not divisible by the stride, you will get norm_map with zeros (at the edges) due to some pixels not reachable but you can easily handle this case as well.

Thanks very much for this Gil, I appreciate it. I think your answer will be really useful to anyone else doing the same thing in the future. — Bled Clement, Sep 26 '20 at 21:52

score 0 · Accepted Answer · answered Sep 26 '20 at 22:38

A slightly less elegant solution than that proposed by Gil:

I took inspiration from this post on the Pytorch forums, formatting my image tensor to be of standard shape B x C x H x W (1 x 1 x 256 x 256). Unfolding:

# CREATE THE UNFOLDED IMAGE SLICES
I = image           # shape [256, 256]
kernel_size = bx    #shape [16]
stride = int(bx/2)  #shape [8]
I2 = I.unsqueeze(0).unsqueeze(0) #shape [1, 1, 256, 256]
patches2 = I2.unfold(2, kernel_size, stride).unfold(3, kernel_size, stride)
#shape [1, 1, 31, 31, 16, 16]

Following this, I do some transforms and filtering to my tensor stack. Before doing this I apply a cosine window and normalise:

# NORMALISE AND WINDOW
Pvv = torch.mean(torch.pow(win, 2))*torch.numel(win)*(noise_std**2)
Pvv = Pvv.double()
mean_patches = torch.mean(patches2, (4, 5), keepdim=True)
mean_patches = mean_patches.repeat(1, 1, 1, 1, 16, 16)
window_patches = win.unsqueeze(0).unsqueeze(0).unsqueeze(0).unsqueeze(0).repeat(1, 1, 31, 31, 1, 1)
zero_mean = patches2 - mean_patches
windowed_patches = zero_mean * window_patches

#SOME FILTERING ....

#ADD MEAN AND WINDOW BEFORE FOLDING BACK TOGETHER.
filt_data_block = (filt_data_block + mean_patches*window_patches) * window_patches

The above code works for me, but a mask would be more simple. Next, I prepare my tensor of form [1, 1, 31, 31, 16, 16] to be transformed back into the original [1, 1, 256, 256]:

# REASSEMBLE THE IMAGE USING FOLD
patches = filt_data_block.contiguous().view(1, 1, -1, kernel_size*kernel_size)
patches = patches.permute(0, 1, 3, 2)
patches = patches.contiguous().view(1, kernel_size*kernel_size, -1)
IR = F.fold(patches, output_size=(256, 256), kernel_size=kernel_size, stride=stride)
IR = IR.squeeze()

This allowed me to create an overlapping sliding window and seamlessly stitch the image back together. Cutting out the filtering makes for an identical image.

Pytorch Unfold and Fold: How do I put this image tensor back together again?

2 Answers2