I have a GCMLE experiment which has three learning objectives (consider these Task A, Task B, and Task C) within a single model_fn()
. The inputs for all 3 objectives are the same (reading a body of text) and I would like to produce three separate predictions. However, for Task C I would like to properly mask some of the
examples in the batch (~20% across each batch). Is the proper way to do this by simply weighting those samples that I want to mask by zero? Consider this loss function..
lossA = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy(
labels=labelsA, logits=logitsA))
lossB = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy(
labels=labelsB, logits=logitsB))
mask_weights = tf.to_float(tf.equal(x, y)) # returns 1 if x equals y, returns 0 if x != y
lossC = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy(
labels=labelsC, logits=logitsC, weights=mask_weights))
loss = lossA + lossB + lossC
Essentially what I am trying to do is mask any samples in the batch where x != y so that there are no gradient updates to the model based on these examples as they relate to taskC. Is this anywhere near the desired effect? Is there a better way to implement the desired behavior?
I realize that I could split these up into separate experiments, but I would like to be able to have a shared embedding and also a single graph which I can upload into the GCMLE prediction service.