Handling input when combining different ground truth for end-to-end CNNs

Question

I am hoping to design a single end-to-end CNN to extract three features: segmentation and bifurcation for thing A, and detection for thing B. There will be a common weight trunk and then three branches with their own weights for the three feature types, and then the branches will be merged. I'm hoping to achieve this with a custom stochastic gradient decent function.

I need to combine different datasets of the same subject matter, where each image contains A and B, but the different datasets contain different ground truths. I am hoping to add an extra vector to each image indicating which of the three ground truths are available (eg [0 0 1]). This is so that the common weights w_0 will always update, but the individual branch weights w_t know to ignore an unsuitable image when encountered or even suitable images if enough are not encountered within a batch.

The problem is I'm not sure how to handle this.

Does the ground truth parameter need to be the same size as the image, and passed as an extra channel of the images with redundant 0s? If so, how do I ensure it is not treated like a normal channel?
Can I pass it in separately, eg [x_train y_train g_train]? How would other layers handle this, particularly compilation and validation?

I am considering doing this with Theano in Lasagne instead of my original intention of Keras due to the latter's higher level of abstraction. Detection of thing B can also be ignored if it over complicates things.

What are the possible ground truth values? Why three branches? Isn't the ground truth your Y_train? If not, what is the difference between what you call Y_train and ground truth? Anything that explains exactly what you want to achieve may help. — Daniel Möller, May 04 '17 at 19:04
The ground truth for each image would be - a black and white image for segmentation (is a pixel part of thing A or not), a vector of x and y pixel coordinates for bifurcation, and either a circular mask or the centre pixel coordinates for detection. I don't know how to achieve all of this with a group of sequential layers as the tasks seem quite different, which is what the partial branching is for. Branching or not, it is a requirement that this is all done within a single network with at least some shared weights between tasks. — Rhiu, May 04 '17 at 19:44

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

So, you have two different shapes for the ground truth.

But since they are "truth", they should go in the Y side, never the X.

Assuming that segmentation of A results in a two-dimension (side,side) matrix the same size as the input image, and that the results for bifurcation are one-dimension (2,) arrays, you can do this:

Creating a model with two outputs

#this is the data you already have

batch = the amount of training images you have
side = the size in pixels of one side of your training images

segTruth = your truth images for segmentation, shaped (batch,side,side)     
bifTruth = your truth coordinates for bifurcations, shaped (batch,2)   
trainImages = your training images, shaped (batch,side,side)

Now, let's create the main trunk:

from keras.models import Model
from keras.layers import *

inp = Input((side,side))
x = Convolution2D(blablabla)(inp)
x = AnyOtherLayerYouNeed(blablabla)(x)
....
trunkOut = TheLastTrunkLayer(balblabla)(x)

Now, we split the model:

b1 = FirstLayerInBranch1(blablaba)(trunkOut)
b2 = FirstLayerInBranch2(blablabl)(trunkOut)

....

out1 = LastLayerInBranch1(blablabla)(b1)    
out2 = LastLayerInBranch2(blablabla)(b2)

And finally, when we define the model, we pass both outputs:

model = Model(inp, [out1,out2])

When compiling, you can define loss = [lossfunction1, lossfunction2] if you want. Or simply give one loss function that will be the same for both outputs.

And when training, pass the truth values also in a list:

model.fit(trainImages, [segTruth,bifTruth],.....)

As you can see, the results are not merged, and the model has two outputs. There are separate loss functions for each output.

If you do need to merge the outputs, that would be a very complicated task, since they have different shapes. If you need that one loss function be more important, you can pass the loss_weights argument in the compile call.

Training each side individually:

In case you want to train or predict using only one branch, all you have to do is to create a new Model, without changing any layer:

modelB1 = Model(inp, out1)    
modelB2 = Model(inp, out2)

So, suppose you only have "bifTruth" for a certain set of images. Then just use modelB2 for training. It doesn't consider the other branch at all.

Before training, you will have to compile each model. But their weights will be common for all three models (model, modelB1 and modelB2).

If you want that some part of the model keeps unchanged while training, you can go to each model.layer[i] and make their .trainable = False before compiling. (This will not change models that are already compiled).

Firstly, thank you massively for your help. How would I handle missing ground truth for a task using this structure? If, for example, an image only has bif data available, I would still want the common trunk weights to update, as well as the bif branch weights, but not the seg branch weights. Would I simply use "None" in the relevant areas of segTruth and bifTruth? Would this automatically work, or would I need to modify the SGD function and others to specifically ignore "None" truths? — Rhiu, May 05 '17 at 04:04
The answer is in the `Training each side individually` part. Just use `modelB2` (which is from the input to the output of the bif branch). Pass it only the input images and the bifurcation truth values. It will train everything in its path (unless you define some layers as untrainable). — Daniel Möller, May 05 '17 at 05:16
Maybe you will have to intercalate training one side, then the other side, then one side again. Otherwise your model may get "addicted" to one branch. You train one side, the other side gets worse because the main trunk got specialized for one task. — Daniel Möller, May 05 '17 at 05:21
A good strategy may be: train everything together first, using only images for which you have both ground truth values. Then make the main trunk untrainable and start training each side separately for images with only one of the two ground truth values. This way, the main trunk doesn't specialize for a single task. — Daniel Möller, May 05 '17 at 05:24
Thank you for this suggestion. To check, would it be at all feasible to train the model entirely in one step via some custom optimiser or loss functions, without explicitly training the branches individually? — Rhiu, May 05 '17 at 14:14
Not sure.... you could try creating "zero values" for the missing ground truth values and create custom loss functions that check if the values are zero. But I have no idea if it works, specially because that loss function will not have a well behaved derivative. — Daniel Möller, May 05 '17 at 14:18
I was hoping to use some of the equations and principles found in this paper in section 3, though frankly it is all a little abstract to me: https://arxiv.org/pdf/1609.02132.pdf — Rhiu, May 05 '17 at 15:54

Handling input when combining different ground truth for end-to-end CNNs

1 Answers1

Creating a model with two outputs

Training each side individually: