I'm trying to use a simple ssd that was trained on 300x300 data with annotated bounding boxes. If I crop the images manually, it works correctly but with full size images it fails (obviously) due to resizing large images to 300x300 removes many visual features.
I figured the good old sliding window will work here, but I have some problems rebuilding the images with detection and must admit I'm a bit clueless on how to approach, What I have so far is:
at first, I tried this:
chips = F.unfold(img_t.data, kernel_size=300)
following some examples from stack overflow, but this gives me error Input Error: Only 4D input Tensors are supported (got 3D)
so following some more googling I found something that works:
patch_w = 300
patch_h = 300
patches = img_t.data.unfold(0, 3, 3).unfold(1, patch_w, patch_h).unfold(2, patch_w, patch_h)
#Visualise small part:
fig = plt.figure(figsize=(4, 4))
fig.tight_layout()
plt.subplots_adjust(left=0.1, bottom=0.1, right=0.9, top=0.9, wspace=0.01, hspace=0.01)
for i in range(4):
for j in range(4):
inp = transp(patches[0][i][j])
inp = np.array(inp)
ax = fig.add_subplot(4, 4, ((i*4)+j)+1, xticks=[], yticks=[])
plt.imshow(inp)
plt.show()
I then feed the patches
to my detector and it looks more or less ok, but there's no overlap (object can be cut into pieces and missed) and more importantly, I can't reverse the process with unfold
without getting drowned in exceptions.
I'm not adamant on using the fold/unfold combination for the task as what I really want is to be able to feed large image into the network in a way that will preserve as much information as possible, mark down the detections and rebuild image with bounding boxes from the smaller patches.
What I came up with is this:
new_im = Image.new("RGB", (300*dims[1], 300*dims[0]))
idx = 0
for i in range(dims[0]):
for j in range(dims[1]):
new_im.paste(tiles[idx], (j*300, i*300))
idx += 1
new_im.show()
which rebuilds the image back, but in a very artificial way, where the detector annotates image cropping and returns them as image list which I here rebuild which is both ugly and inefficient.
After a bit of fiddling, I got it to work, but there comes a peculiarity of pytorch.It adds overlapping parts of patches instead of averaging them (see image). How can I fix it? I realise normalisation won't do anything here since it would normalise the good pixels as well, so it needs to just average overlapping pixels.
Also, please note the image was cropped erroneously. Simple code to reproduce:
def fold_unfold(img_path):
transt = transforms.Compose([transforms.ToTensor(),
# transforms.Normalize(mean=[0.5,0.5, 0.5], std=[0.5, 0.5, 0.5])
])
transp = transforms.Compose([
# transforms.Normalize(mean=[0.5,0.5, 0.5], std=[0.5, 0.5, 0.5]),
transforms.ToPILImage()
])
img_t = transt(Image.open(img_path))
img_t = img_t.unsqueeze(0)
kernel = 300
stride = 200
img_shape = img_t.shape
B, C, H, W = img_shape
# number of pixels missing:
pad_w = W % kernel
pad_h = H % kernel
# Padding the **INPUT** image with missing pixels:
img_t = F.pad(input=img_t, pad=(pad_w//2, pad_w-pad_w//2, pad_h//2, pad_h-pad_h//2), mode='constant', value=0)
img_shape = img_t.shape
B, C, H, W = img_shape
print("\n-----input shape: ", img_shape)
patches = img_t.unfold(3, kernel, stride).unfold(2, kernel, stride).permute(0,1,2,3,5,4)
print("\n-----patches shape:", patches.shape)
# reshape output to match F.fold input
patches = patches.contiguous().view(B, C, -1, kernel*kernel)
print("\n", patches.shape) # [B, C, nb_patches_all, kernel_size*kernel_size]
patches = patches.permute(0, 1, 3, 2)
print("\n", patches.shape) # [B, C, kernel_size*kernel_size, nb_patches_all]
patches = patches.contiguous().view(B, C*kernel*kernel, -1)
print("\n", patches.shape) # [B, C*prod(kernel_size), L] as expected by Fold
# https://pytorch.org/docs/stable/nn.html#torch.nn.Fold
output = F.fold(patches, output_size=(H, W), kernel_size=kernel, stride=stride)
# mask that mimics the original folding:
recovery_mask = F.fold(torch.ones_like(patches), output_size=(H,W), kernel_size=kernel, stride=stride)
output = output/recovery_mask
print(output.shape) # [B, C, H, W]
aspil = transp(output[0])
aspil.show()
still, the image is cropped quite a lot so something is still wrong:
Finally, getting the cropping done Code updated to the working version The problem is coming from the way pytorch does the unfolding. The unfold from tensor method doesn't zero pad automatically, but rather stops the cropping by cutting off the part that didn't fit. I solved it by zero padding the tensor before cropping it.