I'm Trying to implement of Faster-RCNN model with Pytorch.
In the structure, First element of model is Transform.
from torchvision.models.detection import fasterrcnn_resnet50_fpn
model = fasterrcnn_resnet50_fpn(pretrained=True)
print(model.transform)
GeneralizedRCNNTransform(
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Resize(min_size=(800,), max_size=1333, mode='bilinear')
)
When images pass forward of Resize()
, They come out with (800,h)
or (w, 1333)
according to ratio of Width and Height.
for i in range(2):
_, image, target = testset.__getitem__(i)
img = image.unsqueeze(0)
output, _ = model.transform(img)
Before Transform : torch.Size([512, 640])
After Transform : [(800, 1000)]
Before Transform : torch.Size([315, 640])
After Transform : [(656, 1333)]
My question is how to get those resized output and why they use This method? I can't find the information in the paper and I can't understand the source code about transform in fasterrcnn_resnet50_fpn
.
Sorry for my English