I have been trying to visualize the outputs of a VGG-16 network. But the output seems to be just wrong. As you know the convolution doesn't translate the semantic segment of the picture. like for the following picture if the head is on the top part of the picture it should be on top of the picture still after the convolution is done. But it doesn't seem to be the case. I used the following code to extract the intermediate layers.
class vgg16(torch.nn.Module):
def __init__(self, pretrained=True):
super(vgg16, self).__init__()
vgg_pretrained_features = tv.vgg16(pretrained=pretrained).features
self.layerss = torch.nn.Sequential()
for x in range(30):
self.layerss.add_module(str(x), vgg_pretrained_features[x])
self.layerss.eval()
def forward(self, x):
output=[]
for i,layer in enumerate( self.layerss):
# print (i)
x=layer(x)
output.append(x)
return output
model=vgg16()
output=model.forward(img)
import matplotlib.pyplot as plt
plt.imshow(output[0][0][0].detach())
Here is the original picture and the output of the first channel of the first layer in the VGG network :
As you can see the face has moved all the way down and the neckless is all the way up and the overall structure of the picture is broken