How can I pre-compute a mask for each input and adjust the weights according to this mask?

Question

I want to provide a mask, the same size as the input image and adjust the weights learned from the image according to this mask (similar to attention, but pre-computed for each image input). How can I do this with Keras (or TensorFlow)?

Can you include your model architecture? Which weights exactly do you want to adjust? — sdcbr, Mar 25 '19 at 10:23
I use the model here https://github.com/SunnerLi/RAM, and I have a separate program where I compute a spatial point (or multiple of them) in an image that shows the likelihood of an object being there - though it is not a probability map but I localize it using some features - — dusa, Mar 25 '19 at 14:27
I don't want to just mask the input image but I want to rather adjust weights of learned features in conv layers (for example, give a higher weight if it is around the spatial points where I think the object is likely there and lower or zero points on other parts - depending on the likeliness — dusa, Mar 25 '19 at 14:29
So are these masks fixed upfront? Or do you want to calculate them dynamically? — sdcbr, Mar 25 '19 at 14:32
I mean the initial points of course, the attention part for example, in the sample code, starts with a fixed point but then will figure out its way where to pay attention (like a gaze). — dusa, Mar 25 '19 at 14:44
I want to be able to just adjust the weights, so it will be another way of attention, not necessarily identical to the sample code. — dusa, Mar 25 '19 at 14:46
@dusa I wonder if you found a solution or a method to solve this - if so can you please elaborate? — Yuval, Jan 22 '20 at 23:11

Anton Codes · Answer 1 · 2019-04-11T04:04:32.090

Question

How can I add another feature layer to an image, like a Mask, and have the neural network take this new feature layer into account?

Answer

The short answer is to add it as another colour channel to the image. If your image already has 3 colour channels; red, blue, green, then adding another channel of 1 & 0 of a mask gives the neural network that much more information to use to make decisions.

Thought Experiment

As a thought experiment, let's tackle MNIST. MNIST images are 28x28. Let's take 1 image, the 'true' image, and 3 other images, the 'distractions' and form a 56x56 image of the 4 28x28 images. MNIST is black and white so it only has 1 colour channel, brightness. Let's now add another colour channel which is a mask, 1's in area of the 56x56 image where the 'true' image is and 0's else where.

If we use the same architecture as usual for solving MNIST, convolution all the way down, we can imagine that it can use this new information to learn to only pay attention to the 'true' area and categorize the image correctly.

Code Example

In this example we try and solve the XOR problem. We take a classic XOR and double the input with noise and add a channel that is 1's for the non-noise and 0's for the noise


# Adapted from https://github.com/panchishin/learn-to-tensorflow/blob/master/solutions/04-xor-2d.py

# -- The xor problem --
x = np.array([[0., 0.], [1., 1.], [1., 0.], [0., 1.]])
y_ = [[1., 0.], [1., 0.], [0., 1.], [0., 1.]]


def makeBatch() :
    # Add an additional 2 channels of noise
    # either before or after the two real 'x's.
    global x
    rx = np.random.rand(4,4,2) > 0.5
    # set the mask to 0 for all items
    rx[:,:,1] = 0
    index = int(np.random.random()*3)
    rx[:,index:index+2,0] = x
    # set the mask to 1 for 'real' values
    rx[:,index:index+2,1] = 1
    return rx

# -- imports --
import tensorflow as tf

# np.set_printoptions(precision=1) reduces np precision output to 1 digit
np.set_printoptions(precision=2, suppress=True)


# -- induction --

# Layer 0
x0 = tf.placeholder(dtype=tf.float32, shape=[None, 4, 2])
y0 = tf.placeholder(dtype=tf.float32, shape=[None, 2])

# Layer 1
f1 = tf.reshape(x0,shape=[-1,8])
m1 = tf.Variable(tf.random_uniform([8, 9], minval=0.1, maxval=0.9, dtype=tf.float32))
b1 = tf.Variable(tf.random_uniform([9], minval=0.1, maxval=0.9, dtype=tf.float32))
h1 = tf.sigmoid(tf.matmul(f1, m1) + b1)

# Layer 2
m2 = tf.Variable(tf.random_uniform([9, 2], minval=0.1, maxval=0.9, dtype=tf.float32))
b2 = tf.Variable(tf.random_uniform([2], minval=0.1, maxval=0.9, dtype=tf.float32))
y_out = tf.nn.softmax(tf.matmul(h1, m2) + b2)


# -- loss --

# loss : sum of the squares of y0 - y_out
loss = tf.reduce_sum(tf.square(y0 - y_out))

# training step : gradient descent (1.0) to minimize loss
train = tf.train.GradientDescentOptimizer(1.0).minimize(loss)



# -- training --
# run 500 times using all the X and Y
# print out the loss and any other interesting info
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    print("\nloss")
    for step in range(5000):
        sess.run(train, feed_dict={x0: makeBatch(), y0: y_})
        if (step + 1) % 1000 == 0:
            print(sess.run(loss, feed_dict={x0: makeBatch(), y0: y_}))

    results = sess.run([m1, b1, m2, b2, y_out, loss], feed_dict={x0: makeBatch(), y0: y_})
    labels = "m1,b1,m2,b2,y_out,loss".split(",")
    for label, result in zip(*(labels, results)):
        print("")
        print(label)
        print(result)

print("")

Output

We can see that the network correctly solves the problem and give the correct output with high certainty

y_ (truth) = [[1., 0.], [1., 0.], [0., 1.], [0., 1.]]

y_out
[[0.99 0.01]
 [0.99 0.01]
 [0.01 0.99]
 [0.01 0.99]]

loss
0.00056630466

Confirmation that the mask is doing something

Let's change the mask function so that it is just random by commenting out the lines that set 0's for noise and 1's for signal

def makeBatch() :
    global x
    rx = np.random.rand(4,4,2) > 0.5
    #rx[:,:,1] = 0
    index = int(np.random.random()*3)
    rx[:,index:index+2,0] = x
    #rx[:,index:index+2,1] = 1
    return rx

and then rerun the code. Indeed we can see that the network cannot learn without the mask.

y_out
[[0.99 0.01]
 [0.76 0.24]
 [0.09 0.91]
 [0.58 0.42]]

loss
0.8080765

Conclusion

If you have some signal and noise in an image (or other data structure), and successfully add another channel (a mask) that indicates where the signal is and where the noise is, a neural net can leverage that mask to focus on the signal yet still have access to the noise.

It is not exactly what I am looking for, but thanks for a new interesting perspective and putting in the effort to justify. — dusa, Mar 30 '19 at 08:49
Please do! I'm interested in how it works for your specific case. — Anton Codes, Mar 30 '19 at 22:24