0

I am using the following:

I have taken this repo and converted it to use Kitti data. In doing so I have added a new Kitti class in datasets and done the necessary conversion. Both testing and evaluation work with the following class set from PASCAL VOC:

self._classes = (
    '__background__',  # always index 0
    'aeroplane',
    'bicycle',
    'bird',
    'boat',
    'bottle',
    'bus',
    'car',
    'cat',
    'chair',
    'cow',
    'diningtable',
    'dog',
    'horse',
    'motorbike',
    'person',
    'pottedplant',
    'sheep',
    'sofa',
    'train',
    'tvmonitor')

I have changed the class set to:

self._classes = (
    'dontcare',  # always index 0
    'pedestrian',
    'car',
    'truck',
    'cyclist')

#-----------------------------
N.B.: Classes should NOT matter here, as the result out of the backbone is simply a featureset, not a classification
#-----------------------------

In seemingly random images (taking these 'problem' images out of the training set seems to change which image the program fails on) the training code seems to produce NaN out of the region-proposal-network. I'm a bit stuck as to why.

  • Tried changing the normalization to Kitti specific normalization values
  • Tried resizing image to 224x224
  • Tried dividing normalized numbers by averaged standard deviation

    -----------------

    Network Definition

    -----------------

    self.conv1 = conv3x3(inplanes, planes, stride) self.bn1 = norm_layer(planes) self.relu = nn.ReLU(inplace=True) self.conv2 = conv3x3(planes, planes) self.bn2 = norm_layer(planes) self.downsample = downsample self.stride = stride

    self._layers['head'] = nn.Sequential(self.resnet.conv1, self.resnet.bn1, self.resnet.relu,self.resnet.maxpool, self.resnet.layer1, self.resnet.layer2,self.resnet.layer3)

    self.rpn_net = nn.Conv2d(self._net_conv_channels, cfg.RPN_CHANNELS, [3, 3], padding=1)

    -----------------

    Preparing Image

    -----------------

    self._image = torch.from_numpy(image.transpose([0, 3, 1, 2])).to(self._device) self.net.train_step(blobs, self.optimizer)

    -----------------

    Computing Graph

    -----------------

    (1) self.forward(blobs['data'], blobs['im_info'], blobs['gt_boxes']) (2) rois, cls_prob, bbox_pred = self._predict() (3) net_conv = self._image_to_head() (4) net_conv = self._layers'head' (5) rpn = F.relu(self.rpn_net(net_conv))

    -------------------

    Useful functions for problem

    -------------------

    def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1): """3x3 convolution with padding""" return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,padding=dilation, groups=groups, bias=False, dilation=dilation)

    def conv1x1(in_planes, out_planes, stride=1): """1x1 convolution""" return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)

I dont know why this is occuring, but obviously i expect real numbers out of the ResNet101 backbone. May have to switch to vgg16.

OUTPUT OF (3)

tensor([[[[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
...,
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan]],

...,

[[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
...,
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan]],

[[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
...,
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan]]]], device='cuda:0'

Does anyone have an idea of what's going on here?

mHo2
  • 103
  • 1
  • 2
  • 9
  • Okay confirmed changing classes have no effect. Also using the CPU (instead of GPU) has no effect. – mHo2 Aug 28 '19 at 21:15

1 Answers1

0

Solved it. VOC Pascal (the original dataset used with this github repo) has a pixel location start index value of 1[1 to ymax], where Kitti the pixels start at 0[0 to ymax-1].

Need to remove -1's from bounding box target generation.

mHo2
  • 103
  • 1
  • 2
  • 9