I am using the following:
- CUDA 10.0
- PyTorch 1.2
- https://github.com/ruotianluo/pytorch-faster-rcnn
- Testing weight set is not the same as Training weight set.
- Training weight set is from caffe pretrained ResNet101 backbone
I have taken this repo and converted it to use Kitti data. In doing so I have added a new Kitti class in datasets and done the necessary conversion. Both testing and evaluation work with the following class set from PASCAL VOC:
self._classes = (
'__background__', # always index 0
'aeroplane',
'bicycle',
'bird',
'boat',
'bottle',
'bus',
'car',
'cat',
'chair',
'cow',
'diningtable',
'dog',
'horse',
'motorbike',
'person',
'pottedplant',
'sheep',
'sofa',
'train',
'tvmonitor')
I have changed the class set to:
self._classes = (
'dontcare', # always index 0
'pedestrian',
'car',
'truck',
'cyclist')
#-----------------------------
N.B.: Classes should NOT matter here, as the result out of the backbone is simply a featureset, not a classification
#-----------------------------
In seemingly random images (taking these 'problem' images out of the training set seems to change which image the program fails on) the training code seems to produce NaN out of the region-proposal-network. I'm a bit stuck as to why.
- Tried changing the normalization to Kitti specific normalization values
- Tried resizing image to 224x224
Tried dividing normalized numbers by averaged standard deviation
-----------------
Network Definition
-----------------
self.conv1 = conv3x3(inplanes, planes, stride) self.bn1 = norm_layer(planes) self.relu = nn.ReLU(inplace=True) self.conv2 = conv3x3(planes, planes) self.bn2 = norm_layer(planes) self.downsample = downsample self.stride = stride
self._layers['head'] = nn.Sequential(self.resnet.conv1, self.resnet.bn1, self.resnet.relu,self.resnet.maxpool, self.resnet.layer1, self.resnet.layer2,self.resnet.layer3)
self.rpn_net = nn.Conv2d(self._net_conv_channels, cfg.RPN_CHANNELS, [3, 3], padding=1)
-----------------
Preparing Image
-----------------
self._image = torch.from_numpy(image.transpose([0, 3, 1, 2])).to(self._device) self.net.train_step(blobs, self.optimizer)
-----------------
Computing Graph
-----------------
(1) self.forward(blobs['data'], blobs['im_info'], blobs['gt_boxes']) (2) rois, cls_prob, bbox_pred = self._predict() (3) net_conv = self._image_to_head() (4) net_conv = self._layers'head' (5) rpn = F.relu(self.rpn_net(net_conv))
-------------------
Useful functions for problem
-------------------
def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1): """3x3 convolution with padding""" return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,padding=dilation, groups=groups, bias=False, dilation=dilation)
def conv1x1(in_planes, out_planes, stride=1): """1x1 convolution""" return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
I dont know why this is occuring, but obviously i expect real numbers out of the ResNet101 backbone. May have to switch to vgg16.
OUTPUT OF (3)
tensor([[[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
...,
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]]], device='cuda:0'
Does anyone have an idea of what's going on here?