I'm building a model to detect keypoints of body parts. To do that I'm using the COCO dataset (http://cocodataset.org/#download). I'm trying to understand why I'm running into overfitting issues (training loss converges, but I reach a ceiling really early for testing loss). In the model, I've tried adding layers of dropout (gradually adding more layers with higher probabilities, but I quickly get to a point when training loss stops decreasing which is just as bad. My theory is that the model I use isn't complex enough but I'd like to know if that's the likely reason or if it could be something else. The models I've found online are all extremely deep (30+ layers).
Data
I'm using 10,000 RGB images each of which has a single person in it. They each have different sizes but a max of 640 length and width. As a preprocessing step, I make every image the size 640x640 by filling any extra area (bottom and right of image) with (0,0,0) or black.
Targets
The full dataset has many keypoints but I'm only interested in the right shoulder, right elbow, and right wrist. Each body part has 2 keypoints (X coordinate and Y coordinate) so my target is a list of length 6.
Model
activation_function = 'relu'
batch_size = 16 # ##
epoch_count = 40 # ##
loss_function = 'mean_squared_error'
opt = 'adam'
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(3, 3), input_shape=inp_shape))
# model.add(Conv2D(filters=16, kernel_size=(3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=32, kernel_size=(3, 3)))
# model.add(Conv2D(filters=32, kernel_size=(3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(300, activation=activation_function))
model.add(Dropout(rate=0.1))
model.add(Dense(300, activation=activation_function))
model.add(Dense(num_targets))
model.summary()
model.compile(loss=loss_function, optimizer=opt)
hist = model.fit(x_train, y_train, batch_size=batch_size, epochs=epoch_count,
verbose=verbose_level,
validation_data=(x_valid, y_valid))