Getting Nan result out of ResNet101 backbone with Kitti images

Question

I am using the following:

CUDA 10.0
PyTorch 1.2
https://github.com/ruotianluo/pytorch-faster-rcnn
Testing weight set is not the same as Training weight set.
Training weight set is from caffe pretrained ResNet101 backbone

I have taken this repo and converted it to use Kitti data. In doing so I have added a new Kitti class in datasets and done the necessary conversion. Both testing and evaluation work with the following class set from PASCAL VOC:

self._classes = (
    '__background__',  # always index 0
    'aeroplane',
    'bicycle',
    'bird',
    'boat',
    'bottle',
    'bus',
    'car',
    'cat',
    'chair',
    'cow',
    'diningtable',
    'dog',
    'horse',
    'motorbike',
    'person',
    'pottedplant',
    'sheep',
    'sofa',
    'train',
    'tvmonitor')

I have changed the class set to:

self._classes = (
    'dontcare',  # always index 0
    'pedestrian',
    'car',
    'truck',
    'cyclist')

#-----------------------------
N.B.: Classes should NOT matter here, as the result out of the backbone is simply a featureset, not a classification
#-----------------------------

In seemingly random images (taking these 'problem' images out of the training set seems to change which image the program fails on) the training code seems to produce NaN out of the region-proposal-network. I'm a bit stuck as to why.

Tried changing the normalization to Kitti specific normalization values
Tried resizing image to 224x224
Tried dividing normalized numbers by averaged standard deviation

-----------------

Network Definition

-----------------

self.conv1 = conv3x3(inplanes, planes, stride) self.bn1 = norm_layer(planes) self.relu = nn.ReLU(inplace=True) self.conv2 = conv3x3(planes, planes) self.bn2 = norm_layer(planes) self.downsample = downsample self.stride = stride

self._layers['head'] = nn.Sequential(self.resnet.conv1, self.resnet.bn1, self.resnet.relu,self.resnet.maxpool, self.resnet.layer1, self.resnet.layer2,self.resnet.layer3)

self.rpn_net = nn.Conv2d(self._net_conv_channels, cfg.RPN_CHANNELS, [3, 3], padding=1)

-----------------

Preparing Image

-----------------

self._image = torch.from_numpy(image.transpose([0, 3, 1, 2])).to(self._device) self.net.train_step(blobs, self.optimizer)

-----------------

Computing Graph

-----------------

(1) self.forward(blobs['data'], blobs['im_info'], blobs['gt_boxes']) (2) rois, cls_prob, bbox_pred = self._predict() (3) net_conv = self._image_to_head() (4) net_conv = self._layers'head' (5) rpn = F.relu(self.rpn_net(net_conv))

-------------------

Useful functions for problem

-------------------

def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1): """3x3 convolution with padding""" return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,padding=dilation, groups=groups, bias=False, dilation=dilation)

def conv1x1(in_planes, out_planes, stride=1): """1x1 convolution""" return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)

I dont know why this is occuring, but obviously i expect real numbers out of the ResNet101 backbone. May have to switch to vgg16.

OUTPUT OF (3)

tensor([[[[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
...,
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan]],

...,

[[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
...,
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan]],

[[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
...,
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan]]]], device='cuda:0'

Does anyone have an idea of what's going on here?

Okay confirmed changing classes have no effect. Also using the CPU (instead of GPU) has no effect. — mHo2, Aug 28 '19 at 21:15

score 0 · Answer 1 · answered Aug 29 '19 at 01:45

0

Solved it. VOC Pascal (the original dataset used with this github repo) has a pixel location start index value of 1[1 to ymax], where Kitti the pixels start at 0[0 to ymax-1].

Need to remove -1's from bounding box target generation.

answered Aug 29 '19 at 01:45

mHo2

103
1
2
9

Getting Nan result out of ResNet101 backbone with Kitti images

-----------------

Network Definition

-----------------

-----------------

Preparing Image

-----------------

-----------------

Computing Graph

-----------------

-------------------

Useful functions for problem

-------------------

1 Answers1