Overfitting problems for key points model (detecting wrists, elbows, shoulders, etc.)

Question

I'm building a model to detect keypoints of body parts. To do that I'm using the COCO dataset (http://cocodataset.org/#download). I'm trying to understand why I'm running into overfitting issues (training loss converges, but I reach a ceiling really early for testing loss). In the model, I've tried adding layers of dropout (gradually adding more layers with higher probabilities, but I quickly get to a point when training loss stops decreasing which is just as bad. My theory is that the model I use isn't complex enough but I'd like to know if that's the likely reason or if it could be something else. The models I've found online are all extremely deep (30+ layers).

Data

I'm using 10,000 RGB images each of which has a single person in it. They each have different sizes but a max of 640 length and width. As a preprocessing step, I make every image the size 640x640 by filling any extra area (bottom and right of image) with (0,0,0) or black.

Targets

The full dataset has many keypoints but I'm only interested in the right shoulder, right elbow, and right wrist. Each body part has 2 keypoints (X coordinate and Y coordinate) so my target is a list of length 6.

Model

activation_function = 'relu'
batch_size = 16  # ##
epoch_count = 40  # ##
loss_function = 'mean_squared_error'
opt = 'adam'

model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(3, 3), input_shape=inp_shape))
# model.add(Conv2D(filters=16, kernel_size=(3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=32, kernel_size=(3, 3)))
# model.add(Conv2D(filters=32, kernel_size=(3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(300, activation=activation_function))
model.add(Dropout(rate=0.1))
model.add(Dense(300, activation=activation_function))
model.add(Dense(num_targets))
model.summary()
model.compile(loss=loss_function, optimizer=opt)
hist = model.fit(x_train, y_train, batch_size=batch_size, epochs=epoch_count,
                 verbose=verbose_level,
                 validation_data=(x_valid, y_valid))

Having the same issue, did you figure it out? – BiBi Dec 09 '18 at 15:33 — BiBi, Dec 09 '18 at 15:33

score 0 · Answer 1 · answered Oct 07 '17 at 21:07

Your theory

the model I use isn't complex enough

it's a good theory, the model is pretty simple and given that we don't know exactly how much overfitting are you suffering it seems possible that the overfitting is because of the complexity of the model.

In the model, I've tried adding layers of dropout

Could be a simple but effective way of making the model a little more complex, but furthermore, I'd increase the dropout rate. It seems that you have a dropout of 0.1, try 0.5 for example and compare if the overfitting decreases.

Anyway, I think the best you can try is incrementing the complexity of the model, but in the convolution part, not just adding Dense layers after the Flatten. If it seems difficult to you, I suggest to find some pre-built general architectures for Convolutional Neural Networks for Image Recognition or even more specific builds for similar problems to yours.

Tell us how it goes!

score 0 · Answer 2 · answered Oct 08 '17 at 14:18

In addition to already said in the answers:

You can have several Dropout layers with different probabilities, e.g. after the pooling layers. Early layers often have higher keep probability, since they use fewer filters.
Image data augmentation is another way towards generalization and in my experience it always improves the result, at least slightly (of course, provided that input transformation is not severe).
Batch normalization (and its successors, weight normalization and layer normalization) is a modern regularization method that reduces the required dropout intensity, sometimes completely, i.e. you can get rid of dropout layers. In addition, batchnorm improves activations statistics, which often makes the network learn faster. I used it in addition to dropout and it worked pretty well.
A technique called Scaled Exponential Linear Units (SELU) has been published very recently, which is said to have implicit self-normalizing properties. It's even already implemented in keras.
The good old L2 or L1 regularizer is still in use. If nothing else helps, consider adding it too. But I'm pretty sure that batchnorm, selu and few dropout layers will be enough.

Overfitting problems for key points model (detecting wrists, elbows, shoulders, etc.)

2 Answers2