I wish to implement an operation similar to 2D convolution in TensorFlow. As per my understanding, the most common approach to implementing convolution is by first applying an im2col
operation to the image (see here - subsection "Implementation as Matrix Multiplication") - an operation that transforms an image into a 2D matrix with individual "chunks" of the image to which the kernel is applied as flattened columns.
In other words, this excerpt from the above linked resource explains what im2col
does nicely:
[...] For example, if the input is [227x227x3] (in the format height x width x n_channels) and it is to be convolved with 11x11x3 filters at stride 4, then we would take [11x11x3] blocks of pixels in the input and stretch each block into a column vector of size 11*11*3 = 363. Iterating this process in the input at stride of 4 gives (227-11)/4+1 = 55 locations along both width and height, leading to an output matrix
X_col
ofim2col
of size [363 x 3025], where every column is a stretched out receptive field and there are 55*55 = 3025 of them in total. Note that since the receptive fields overlap, every number in the input volume may be duplicated in multiple distinct columns.
As I understand from the TensorFlow docs, that is what's done internally with tf.nn.conv2d
as well.
Now, I would like to implement said im2col
operation in TensorFlow separately (as I wish to have access to this intermediary result). As this involves copying of values in a non-trivial way, how would I build a relatively efficient computational graph for this operation myself? Similarly, how would one implement the reverse operation?