I've been reading a paper A Perceptual Measure for Deep Single Image Camera Calibration where they adopt DenseNet with a last layer replaced by three separate heads.
I take DenseNet from keras:
base_model = DenseNet169(include_top = False, weights = 'imagenet')
set trainable to False for it's layers and add those heads in a following manner:
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(4096, activation = 'relu')(x)
psi = Dense(256, activation = 'softmax')(x)
Unfortunately that doesn't converge at all: validation error just grow unbounded while training. I'm quite sure about training data, so my current theory is that heads should be a little more complicated.
Does anybody implement that paper or have an idea of what those heads should look like?