my own implementation of FastRCNN cannot perform well on balanced data

Question

2020.06.09

There are 700 images for training, each of them extract 64 rois and make a mini-batch, when batch-size is set to 2, it cast 350 steps to complete training, but for RCNN, each target is extracted as a single image resized to 224*224, there will be 64*700=44800 images, each of which contains more information and features than a 7*7 pooled feature map and I guess that's why it seems under-fitting though RCNN could be train well on same data.

==========================================================================

Use fully balanced data, and acc drops to 0.53 (training data)

[0.5233287 0.4766713] not plane
[0.5281736 0.4718264] not plane
[0.53316545 0.46683457] not plane
[0.5287853 0.4712147] not plane
[0.52475226 0.47524777] not plane
[0.5293444 0.4706556] not plane
[0.52849627 0.47150376] not plane
[0.52786124 0.4721388 ] not plane
[0.52475226 0.47524777] not plane
[0.53224194 0.4677581 ] not plane
[0.5313732 0.4686268] not plane
[0.528143   0.47185704] not plane
[0.5233287 0.4766713] not plane
[0.5233839 0.4766161] not plane
[0.525427   0.47457302] not plane
[0.51949245 0.48050752] not plane
[0.52733606 0.47266394] not plane
[0.5268566  0.47314337] not plane
[0.52158654 0.47841352] not plane
[0.5412768  0.45872322] not plane
[0.5277719  0.47222808] not plane
[0.5223139 0.4776861] not plane
[0.5289101  0.47108996] not plane
[0.5207478  0.47925228] not plane
[0.52475226 0.47524777] not plane
[0.53407675 0.46592325] not plane
[0.53204036 0.4679596 ] not plane
[0.52786124 0.4721388 ] not plane
[0.52574503 0.47425497] not plane
[0.5271339  0.47286615] not plane
[0.5224281 0.4775719] not plane
[0.5233839 0.4766161] not plane
[0.5196227  0.48037735] not plane
[0.52554363 0.47445634] not plane
[0.52554363 0.47445634] not plane
[0.5446083  0.45539168] not plane
[0.53676397 0.46323603] not plane
[0.53944343 0.46055657] not plane
[0.520972 0.479028] not plane
[0.5492453  0.45075467] not plane
[0.52860624 0.47139376] not plane
[0.5273249 0.4726751] not plane
[0.52752113 0.4724789 ] not plane
[0.52902967 0.47097033] not plane
[0.5307333  0.46926668] not plane
[0.5322479  0.46775213] not plane
[0.53944343 0.46055657] not plane
[0.5499064 0.4500937] not plane
[0.5403881 0.4596119] not plane
[0.5203569  0.47964308] not plane
[0.52871954 0.47128052] not plane
[0.53245085 0.46754912] not plane
[0.5324656 0.4675344] not plane
[0.519246   0.48075405] not plane
[0.5299878  0.47001216] not plane
[0.527601   0.47239903] not plane
[0.5228142 0.4771858] not plane
[0.53725046 0.46274957] not plane

I think this network was just guessing but not learning...

==========================================================================

2020.06.08

I follow this structure used in many repos in GitHub, but the acc wont improve:

def build_model():
    pooled_square_size = 7
    num_rois = 32
    roi_input = Input(shape=(num_rois, 4), name="input_2")
    model_cnn = tf.keras.applications.VGG16(
        include_top=True,
        weights='imagenet'
    )
    x = model_cnn.layers[17].output
    x = RoiPoolingConv(pooled_square_size, roi_input.shape[1])([x, roi_input])
    x = TimeDistributed(Flatten())(x)
    x = TimeDistributed(Dense(4096, activation='selu'))(x)
    x = TimeDistributed(Dropout(0.5))(x)
    x = TimeDistributed(Dense(4096, activation='selu'))(x)
    x = TimeDistributed(Dropout(0.5))(x)
    x = TimeDistributed(Dense(2, activation='softmax', kernel_initializer='zero'))(x)
    model_final = Model(inputs=[model_cnn.input, roi_input], outputs=x)
    opt = Adam(lr=0.0001)
    model_final.compile(
        loss=tf.keras.losses.CategoricalCrossentropy(),
        optimizer=opt,
        metrics=["accuracy"]
    )
    model_final.save("TrainedModels" + slash + "FastRCNN.h5")

Training Logs:

100/100 [==============================] - ETA: 0s - loss: 0.5556 - accuracy: 0.7681
Epoch 00001: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 41s 412ms/step - loss: 0.5556 - accuracy: 0.7681
Epoch 2/100
100/100 [==============================] - ETA: 0s - loss: 0.5223 - accuracy: 0.7910
Epoch 00002: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 41s 414ms/step - loss: 0.5223 - accuracy: 0.7910
Epoch 3/100
100/100 [==============================] - ETA: 0s - loss: 0.5340 - accuracy: 0.7797
Epoch 00003: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 42s 416ms/step - loss: 0.5340 - accuracy: 0.7797
Epoch 4/100
100/100 [==============================] - ETA: 0s - loss: 0.5309 - accuracy: 0.7825
Epoch 00004: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 43s 427ms/step - loss: 0.5309 - accuracy: 0.7825
Epoch 5/100
100/100 [==============================] - ETA: 0s - loss: 0.5257 - accuracy: 0.7840
Epoch 00005: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 43s 434ms/step - loss: 0.5257 - accuracy: 0.7840
Epoch 6/100
100/100 [==============================] - ETA: 0s - loss: 0.5181 - accuracy: 0.7928
Epoch 00006: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 42s 423ms/step - loss: 0.5181 - accuracy: 0.7928
Epoch 7/100
100/100 [==============================] - ETA: 0s - loss: 0.5483 - accuracy: 0.7712
Epoch 00007: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 42s 418ms/step - loss: 0.5483 - accuracy: 0.7712
Epoch 8/100
100/100 [==============================] - ETA: 0s - loss: 0.5282 - accuracy: 0.7832
Epoch 00008: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 43s 429ms/step - loss: 0.5282 - accuracy: 0.7832
Epoch 9/100
100/100 [==============================] - ETA: 0s - loss: 0.5385 - accuracy: 0.7765
Epoch 00009: saving model to TrainedModels\FastRCNN.h5

Reference:

touchylk/cac

anandhupvr/fasterRcnn

xautdestiny/Model_Collection

==========================================================================

I wrote a bi-classification model for detect airplane in photos based on Fast-RCNN, train datasets are generated by Selective Search, when I use datasets with a Negative/Positive ratio about 1, the model could only have about 0.6 acc on train datasets, when I make N/P ratio higher and closer to its origin ratio generated by Selective Search, the train acc can reach 0.9 but it performs badly when used to predict test datasets. During training, the train acc always be the same after epoch completes, when I use TensorBoard, I see weights of layers do not change after epochs: TensorBoard Histogram of Weights

This is the basic structure of of my model, the feature extraction was VGG16 and output a 28*28 feature map to ROI Pooling layer, I try to change activation from ReLu to SeLu, but it didn't work: Model Structure

These are the input image and its feature maps (28*28*512) before and after (32*14*14*512) ROI Pooling layer:

Input Image

Feature Map Before ROI_P

One Typical Feature Map of ROI After ROI_P

Another Typical Feature Map of ROI After ROI_P

and I used this code to generate this model:

def build_model():
    num_rois = 32
    roi_input = Input(shape=(num_rois, 4), name="input_2")
    model_cnn = tf.keras.applications.VGG16(
        include_top=True,
        weights='imagenet'
    )
    x = model_cnn.layers[13].output
    x = RoiPoolingConv(pooled_square_size, roi_input.shape[1])([x, roi_input])
    for layer in model_cnn.layers[15:]:
        x = TimeDistributed(layer)(x)
    x = TimeDistributed(Dense(512, activation='sigmoid'))(x)
    x = TimeDistributed(Dense(2, activation='softmax'))(x)
    model_final = Model(inputs=[model_cnn.input, roi_input], outputs=x)
    opt = Adam(lr=0.0001)
    model_final.compile(
        loss=tf.keras.losses.BinaryCrossentropy(),
        optimizer=opt,
        metrics=["accuracy"]
    )
    model_final.save("TrainedModels" + slash + "FastRCNN.h5")

The full code can be seen here: Github Repo

I've tried add BatchNormalization, adjust LR, or simply add more layers, but the model do not improve a bit at all, I eagerly look forward to somebody who can tell me the key flaw in this model so I can have further test on it, Thank you!

I highly suspect this VGG16 has something weird:

This is an input image:

This is its corresponding output feature map

score 0 · Accepted Answer · answered Jun 09 '20 at 06:06

Damn, now I know what problem it is:

In ROI_Pooling.py:

 def call(self, x, mask=None):
        assert (len(x) == 2)
        # x[0] is image with shape (rows, cols, channels)
        img = x[0]
        # x[1] is roi with shape (num_rois,4) with ordering (x1,y1,x2,y2)
        rois = x[1]

        input_shape = img.shape

        outputs = []

        x1 = rois[:, :, 0]
        y1 = rois[:, :, 1]
        x2 = rois[:, :, 2]
        y2 = rois[:, :, 3]

Used to be:

def call(self, x, mask=None):
        assert (len(x) == 2)

        # x[0] is image with shape (rows, cols, channels)
        img = x[0]

        # x[1] is roi with shape (num_rois,4) with ordering (x,y,w,h)
        rois = x[1]

        input_shape = img.shape

        outputs = []

        for roi_idx in range(self.num_rois):
            x1 = rois[0, roi_idx, 0]
            y1 = rois[0, roi_idx, 1]
            x2 = rois[0, roi_idx, 2]
            y2 = rois[0, roi_idx, 3]

You can clearly see that only the first batch roi was used to produce result.

Now the result is much improved:

my own implementation of FastRCNN cannot perform well on balanced data

1 Answers1