Implementing im2col in TensorFlow

Question

I wish to implement an operation similar to 2D convolution in TensorFlow. As per my understanding, the most common approach to implementing convolution is by first applying an im2col operation to the image (see here - subsection "Implementation as Matrix Multiplication") - an operation that transforms an image into a 2D matrix with individual "chunks" of the image to which the kernel is applied as flattened columns.

In other words, this excerpt from the above linked resource explains what im2col does nicely:

[...] For example, if the input is [227x227x3] (in the format height x width x n_channels) and it is to be convolved with 11x11x3 filters at stride 4, then we would take [11x11x3] blocks of pixels in the input and stretch each block into a column vector of size 11*11*3 = 363. Iterating this process in the input at stride of 4 gives (227-11)/4+1 = 55 locations along both width and height, leading to an output matrix X_col of im2col of size [363 x 3025], where every column is a stretched out receptive field and there are 55*55 = 3025 of them in total. Note that since the receptive fields overlap, every number in the input volume may be duplicated in multiple distinct columns.

As I understand from the TensorFlow docs, that is what's done internally with tf.nn.conv2d as well.

Now, I would like to implement said im2col operation in TensorFlow separately (as I wish to have access to this intermediary result). As this involves copying of values in a non-trivial way, how would I build a relatively efficient computational graph for this operation myself? Similarly, how would one implement the reverse operation?

Patwie · Accepted Answer · 2018-01-29T21:45:24.337

You can easily do this using extract_image_patches.

This function puts each filter_size x filter_size patch of the image into the depth yielding a [batch_size, height, width, 9] tensor.

To compare against tf.nn.conv2d you can implement the Sobel operator for images

import tensorflow as tf
import numpy as np

image = np.arange(10 * 10 * 1).reshape(1, 10, 10, 1)

images = tf.convert_to_tensor(image.astype(np.float32))

filter_size = 3
sobel_x = tf.constant([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], tf.float32)
sobel_x_filter = tf.reshape(sobel_x, [3, 3, 1, 1])

image_patches = tf.extract_image_patches(images,
                                         [1, filter_size, filter_size, 1],
                                         [1, 1, 1, 1], [1, 1, 1, 1],
                                         padding='SAME')


actual = tf.reduce_sum(tf.multiply(image_patches, tf.reshape(sobel_x_filter, [9])), 3, keep_dims=True)
expected = tf.nn.conv2d(images, sobel_x_filter, strides=[1, 1, 1, 1], padding='SAME')

with tf.Session() as sess:
    print sess.run(tf.reduce_sum(expected - actual))

This gives you 0.0 as they are equivalent. This does not need a reverse function.

edit:

As I understand from the TensorFlow docs, that is what's done internally with tf.nn.conv2d as well.

Nope, not really. TF on the GPU for example rely on CuDNN which is a more complex beast (winograd, ptx, ...). Only in some circumstances it uses the im2col approach like here on CPU and the quantized version here.

Implementing im2col in TensorFlow

1 Answers1