I want to provide a mask, the same size as the input image and adjust the weights learned from the image according to this mask (similar to attention, but pre-computed for each image input). How can I do this with Keras (or TensorFlow)?
-
2Can you include your model architecture? Which weights exactly do you want to adjust? – sdcbr Mar 25 '19 at 10:23
-
I use the model here https://github.com/SunnerLi/RAM, and I have a separate program where I compute a spatial point (or multiple of them) in an image that shows the likelihood of an object being there - though it is not a probability map but I localize it using some features - – dusa Mar 25 '19 at 14:27
-
I don't want to just mask the input image but I want to rather adjust weights of learned features in conv layers (for example, give a higher weight if it is around the spatial points where I think the object is likely there and lower or zero points on other parts - depending on the likeliness – dusa Mar 25 '19 at 14:29
-
So are these masks fixed upfront? Or do you want to calculate them dynamically? – sdcbr Mar 25 '19 at 14:32
-
Currently, they are fixed, computed upfront – dusa Mar 25 '19 at 14:42
-
I mean the initial points of course, the attention part for example, in the sample code, starts with a fixed point but then will figure out its way where to pay attention (like a gaze). – dusa Mar 25 '19 at 14:44
-
I want to be able to just adjust the weights, so it will be another way of attention, not necessarily identical to the sample code. – dusa Mar 25 '19 at 14:46
-
@dusa I wonder if you found a solution or a method to solve this - if so can you please elaborate? – Yuval Jan 22 '20 at 23:11
1 Answers
Question
How can I add another feature layer to an image, like a Mask, and have the neural network take this new feature layer into account?
Answer
The short answer is to add it as another colour channel to the image. If your image already has 3 colour channels; red, blue, green, then adding another channel of 1 & 0 of a mask gives the neural network that much more information to use to make decisions.
Thought Experiment
As a thought experiment, let's tackle MNIST. MNIST images are 28x28. Let's take 1 image, the 'true' image, and 3 other images, the 'distractions' and form a 56x56 image of the 4 28x28 images. MNIST is black and white so it only has 1 colour channel, brightness. Let's now add another colour channel which is a mask, 1's in area of the 56x56 image where the 'true' image is and 0's else where.
If we use the same architecture as usual for solving MNIST, convolution all the way down, we can imagine that it can use this new information to learn to only pay attention to the 'true' area and categorize the image correctly.
Code Example
In this example we try and solve the XOR problem. We take a classic XOR and double the input with noise and add a channel that is 1's for the non-noise and 0's for the noise
# Adapted from https://github.com/panchishin/learn-to-tensorflow/blob/master/solutions/04-xor-2d.py
# -- The xor problem --
x = np.array([[0., 0.], [1., 1.], [1., 0.], [0., 1.]])
y_ = [[1., 0.], [1., 0.], [0., 1.], [0., 1.]]
def makeBatch() :
# Add an additional 2 channels of noise
# either before or after the two real 'x's.
global x
rx = np.random.rand(4,4,2) > 0.5
# set the mask to 0 for all items
rx[:,:,1] = 0
index = int(np.random.random()*3)
rx[:,index:index+2,0] = x
# set the mask to 1 for 'real' values
rx[:,index:index+2,1] = 1
return rx
# -- imports --
import tensorflow as tf
# np.set_printoptions(precision=1) reduces np precision output to 1 digit
np.set_printoptions(precision=2, suppress=True)
# -- induction --
# Layer 0
x0 = tf.placeholder(dtype=tf.float32, shape=[None, 4, 2])
y0 = tf.placeholder(dtype=tf.float32, shape=[None, 2])
# Layer 1
f1 = tf.reshape(x0,shape=[-1,8])
m1 = tf.Variable(tf.random_uniform([8, 9], minval=0.1, maxval=0.9, dtype=tf.float32))
b1 = tf.Variable(tf.random_uniform([9], minval=0.1, maxval=0.9, dtype=tf.float32))
h1 = tf.sigmoid(tf.matmul(f1, m1) + b1)
# Layer 2
m2 = tf.Variable(tf.random_uniform([9, 2], minval=0.1, maxval=0.9, dtype=tf.float32))
b2 = tf.Variable(tf.random_uniform([2], minval=0.1, maxval=0.9, dtype=tf.float32))
y_out = tf.nn.softmax(tf.matmul(h1, m2) + b2)
# -- loss --
# loss : sum of the squares of y0 - y_out
loss = tf.reduce_sum(tf.square(y0 - y_out))
# training step : gradient descent (1.0) to minimize loss
train = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
# -- training --
# run 500 times using all the X and Y
# print out the loss and any other interesting info
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print("\nloss")
for step in range(5000):
sess.run(train, feed_dict={x0: makeBatch(), y0: y_})
if (step + 1) % 1000 == 0:
print(sess.run(loss, feed_dict={x0: makeBatch(), y0: y_}))
results = sess.run([m1, b1, m2, b2, y_out, loss], feed_dict={x0: makeBatch(), y0: y_})
labels = "m1,b1,m2,b2,y_out,loss".split(",")
for label, result in zip(*(labels, results)):
print("")
print(label)
print(result)
print("")
Output
We can see that the network correctly solves the problem and give the correct output with high certainty
y_ (truth) = [[1., 0.], [1., 0.], [0., 1.], [0., 1.]]
y_out
[[0.99 0.01]
[0.99 0.01]
[0.01 0.99]
[0.01 0.99]]
loss
0.00056630466
Confirmation that the mask is doing something
Let's change the mask function so that it is just random by commenting out the lines that set 0's for noise and 1's for signal
def makeBatch() :
global x
rx = np.random.rand(4,4,2) > 0.5
#rx[:,:,1] = 0
index = int(np.random.random()*3)
rx[:,index:index+2,0] = x
#rx[:,index:index+2,1] = 1
return rx
and then rerun the code. Indeed we can see that the network cannot learn without the mask.
y_out
[[0.99 0.01]
[0.76 0.24]
[0.09 0.91]
[0.58 0.42]]
loss
0.8080765
Conclusion
If you have some signal and noise in an image (or other data structure), and successfully add another channel (a mask) that indicates where the signal is and where the noise is, a neural net can leverage that mask to focus on the signal yet still have access to the noise.

- 3,663
- 1
- 19
- 28
-
1It is not exactly what I am looking for, but thanks for a new interesting perspective and putting in the effort to justify. – dusa Mar 30 '19 at 08:49
-
-