Backpropagating gradients through nested tf.map_fn

Question

I would like to map a TensorFlow function on each vector corresponding to the depth channel of every pixel in a matrix with dimension [batch_size, H, W, n_channels].

In other words, for every image of size H x W that I have in the batch:

I extract some features maps F_k (whose number is n_channels) with the same size H x W (hence, the features maps all together are a tensor of shape [H, W, n_channels];
then, I wish to apply a custom function to the vector v_ij that is associated with the i-th row and j-th column of each feature map F_k, but explores the depth channel in its entirety (e.g. v has dimension [1 x 1 x n_channels]). Ideally, all of this would happen in parallel.

A picture to explain the process can be found below. The only difference with the picture is that both input and output "receptive fields" have size 1x1 (apply the function to each pixel independently).

This would be similar to applying a 1x1 convolution to the matrix; however, I need to apply a more general function over the depth channel, rather than a simple sum operation.

I think tf.map_fn() could be an option and I tried the following solution, where I recursively use tf.map_fn() to access the features associated with each pixel. However, this kind of seems sub-optimal, and most importantly it raises an error when trying to backpropagate the gradients.

Do you have any idea of the reason why this happens and how I should structure my code to avoid the error?

This is my current implementation of the function:

import tensorflow as tf
from tensorflow import layers


def apply_function_on_pixel_features(incoming):
    # at first the input is [None, W, H, n_channels]
    if len(incoming.get_shape()) > 1:
        return tf.map_fn(lambda x: apply_function_on_pixel_features(x), incoming)
    else:
        # here the input is [n_channels]
        # apply some function that applies a transfomration and returns a vetor of the same size
        output = my_custom_fun(incoming) # my_custom_fun() doesn't change the shape
        return output

and the body of my code:

H = 128
W = 132
n_channels = 8

x1 = tf.placeholder(tf.float32, [None, H, W, 1])
x2 = layers.conv2d(x1, filters=n_channels, kernel_size=3, padding='same')

# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)  
x4 = tf.nn.softmax(x3)

loss = cross_entropy(x4, labels)
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.minimize(loss)  # <--- ERROR HERE!

Particularly, the error is the following:

File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2481, in AddOp
    self._AddOpInternal(op)

File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2509, in _AddOpInternal
    self._MaybeAddControlDependency(op)
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2547, in _MaybeAddControlDependency
    op._add_control_input(self.GetControlPivot().op)

AttributeError: 'NoneType' object has no attribute 'op'

The whole error stack and the code can be found here. Thanks for the help,

G.

Update:

Following @thushv89 suggestion, I added a possible solution to the problem. I still don't know why my previous code didn't work. Any insight on this would still be very appreciated.

See https://stackoverflow.com/questions/49977236/tensorflow-broadcasting — geometrikal, Nov 29 '19 at 02:13
@geometrikal Thank you for your answer. I am afraid that I didn't explain the problem well enough. I updated the question, so maybe it is more clear. If you still think broadcasting is the best option, could you explain better how to use it in my case, please? (I didn't understand) — gab, Nov 29 '19 at 10:40
I updated with the question with my current code and problems — gab, Nov 29 '19 at 19:12
I think the error has something to do with the if statement and the recursion in the apply function. Are you able to share the exact function you with to apply? I think broadcasting can be used for basic maths and some of the other tensorflow functions takes an axis argument. Im not sure if just any function can be applied. — geometrikal, Nov 29 '19 at 20:18
I agree with you, that is my guess too. I tried to put the code used for the experiment in one place, you can find it here: https://www.dropbox.com/sh/n73pmo5rr380mhi/AAC7vC_qkEieXkslcSuXaiy3a?dl=0 . Unfortunately, due to privacy issues, I cannot share the data used for the experiment, but it will be easy to substitute them with what you have. It is a segmentation task: input = grayscale image, output = segmentation mask. Check the main.py file, it's all there. Thank you again for your help :) — gab, Nov 29 '19 at 21:40
@gabriele, looking at the image, you seem to be trying to apply a some custom function on each pixel in the feature maps? Is that correct, if so, why do you need recursion? Simply do a reshape, do map_fn and do another reshape back to the original shape? — thushv89, Dec 03 '19 at 22:31
@gabriele perhaps adding why do you want to do would give more context and help the readers come up with a solution. This is super math intensive by the looks of it. — Chaitanya Bapat, Dec 07 '19 at 01:55
@thushv89 I followed your suggestion and I could propagate the gradients. I still don't understand what was wrong in my implementation, but it seems that now things are working. Thanks a lot :) I add the current solution in the answers — gab, Dec 07 '19 at 16:46

score 1 · Answer 1 · answered Dec 07 '19 at 17:01

Following @thushv89 suggestion, I reshaped the array, applied the function and then reshaped it back (so to avoid the tf.map_fn recursion). I still don't know exactly why the previous code didn't work, but the current implementation allowed to propagate the gradients back to the previous layers. I'll leave it below, for whom might be interested:

def apply_function_on_pixel_features(incoming, batch_size):

    # get input shape:
    _, W, H, C = incoming.get_shape().as_list()
    incoming_flat = tf.reshape(incoming, shape=[batch_size * W * H, C])

    # apply function on every vector of shape [1, C]
    out_matrix = my_custom_fun(incoming_flat)  # dimension remains unchanged

    # go back to the input shape shape [None, W, H, C]
    out_shape = tf.convert_to_tensor([batch_size, W, H, C])
    out_matrix = tf.reshape(out_matrix, shape=out_shape)

    return out_matrix

Notice that now I needed to give the batch size to correctly reshape the tensor because TensorFlow would complain if I gave None or -1 as a dimension.

Any comments and insight on the above code would still be very appreciated.

Hey, thanks for the question, it is interesting. One thing: Should you not have written `incoming_flat = tf.reshape(incoming, shape=[-1, C])` and `out_shape = tf.convert_to_tensor([-1, W, H, C])`? — MPKenning, Nov 17 '22 at 08:31

thushv89 · Accepted Answer · 2019-12-12T09:59:26.610

@gabriele regarding having to depend on batch_size, have you tried doing it the following way? This function does not depend on batch_size. You can replace the map_fn with anything you like.

def apply_function_on_pixel_features(incoming):

    # get input shape:
    _, W, H, C = incoming.get_shape().as_list()
    incoming_flat = tf.reshape(incoming, shape=[-1, C])

    # apply function on every vector of shape [1, C]
    out_matrix = tf.map_fn(lambda x: x+1, incoming_flat)  # dimension remains unchanged

    # go back to the input shape shape [None, W, H, C]
    out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])

    return out_matrix

The full code of what I tested is as below.

import numpy as np
import tensorflow as tf
from tensorflow.keras.losses import categorical_crossentropy

def apply_function_on_pixel_features(incoming):

    # get input shape:
    _, W, H, C = incoming.get_shape().as_list()
    incoming_flat = tf.reshape(incoming, shape=[-1])

    # apply function on every vector of shape [1, C]
    out_matrix = tf.map_fn(lambda x: x+1, incoming_flat)  # dimension remains unchanged

    # go back to the input shape shape [None, W, H, C]
    out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])

    return out_matrix

H = 32
W = 32
x1 = tf.placeholder(tf.float32, [None, H, W, 1])
labels = tf.placeholder(tf.float32, [None, 10])
x2 = tf.layers.conv2d(x1, filters=1, kernel_size=3, padding='same')

# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)  
x4 = tf.layers.flatten(x3)
x4 = tf.layers.dense(x4, units=10, activation='softmax')

loss = categorical_crossentropy(labels, x4)
optimizer = tf.train.AdamOptimizer(0.001)
train_op = optimizer.minimize(loss)


x = np.zeros(shape=(10, H, W, 1))
y = np.random.choice([0,1], size=(10, 10))


with tf.Session() as sess:
  tf.global_variables_initializer().run()
  sess.run(train_op, feed_dict={x1: x, labels:y})

Hi @thushv89, thanks for your suggestion. However, with what you propose I would reshape the tensor to have shape [-1] rather than [batch_size * W * H, C] (which is what I need to apply the function consistently to all the features of each pixel). Also, I think that reshaping to [-1, C] and then [-1, W, H, C] gave me an error. TensorFlow seems to complain because it cannot convert to tensor an object with unknown shape. — gab, Dec 08 '19 at 09:48
@gabriele, worked fine for me actually. Do you have the error? — thushv89, Dec 08 '19 at 09:56
Hi @thushv89, sorry for the late reply but I couldn't test this before. I tried again substituting batch_size with -1 and now it seems to work. Probably I had something off. Thank you for your help! :) You should update the line `incoming_flat = tf.reshape(incoming, shape=[-1])` using `shape=[-1, C]`, that is what I want to obtain, then I'll assign you the bounty — gab, Dec 12 '19 at 09:40

Backpropagating gradients through nested tf.map_fn

2 Answers2

Linked