Deconvolution with Metal Performance Shaders

Question

Turns out there is no such operation as deconvolution in MPS. The closest analogue in tensorflow is conv2d_transpose.

Is it possible to sort of plug-in custom operations between MPS default operations?

Matthijs Hollemans · Answer 1 · 2017-02-22T10:35:32.770

You can write your own Metal compute kernels and execute those in between the MPS operations.

For example:

let commandBuffer = commandQueue.makeCommandBuffer()

. . .

// Do something with an MPSCNN layer:
layer1.encode(commandBuffer: commandBuffer, sourceImage: img1, destinationImage: img2)

// Perform your own compute kernel:
let encoder = commandBuffer.makeComputeCommandEncoder()
encoder.setComputePipelineState(yourOwnComputePipeline)
encoder.setTexture(img2.texture, at: 0)
encoder.setTexture(img3.texture, at: 1)
let threadGroupSize = MTLSizeMake(. . .)
let threadGroups = MTLSizeMake(img2.texture.width / threadGroupSize.width,
                               img2.texture.height / threadGroupSize.height, 1)
encoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupSize)
encoder.endEncoding()

// Do something with another MPSCNN layer:
layer2.encode(commandBuffer: commandBuffer, sourceImage: img3, destinationImage: img4)

. . .

commandBuffer.commit()

You have to write your own compute kernel in the Metal Shading Language and load this into the yourOwnComputePipeline object. Then you can encode it into the current command buffer whenever you want.

Yeah, I know that. Do you have any examples on how to pass `MPSImage` as parameter and work with it inside `kernel` function? — s1ddok, Feb 20 '17 at 21:13
You don't pass the `MPSImage` directly. Instead you use `image.texture`, like I did in the code example. If you have <= 4 channels then the texture is just a single texture object; with > 4 channels it's actually an array of textures. — Matthijs Hollemans, Feb 21 '17 at 09:58

Matthijs Hollemans · Answer 2 · 2017-03-26T10:16:44.547

1

[I am adding this as a new answer because it's a different solution.]

Note that deconvolution in deep learning is also known as "transposed convolution", which means that it's the same as doing a regular convolution but with the kernels horizontally and vertically flipped.

So you should be able to use a regular MPSCNNConvolution layer that takes the MPSImage that you wish to deconvolve as input, and that uses the same kernels as the "forward" convolution step but flipped horizontally and vertically.

The advantage of doing this over writing your own compute kernel is that you can use the very fast kernels from MPS.

Edit: An example. Let's say your conv kernel weights look like this:

1, 2, 3
4, 5, 6
7, 8, 9

Then after flipping the kernel, the weights look like this:

9, 8, 7
6, 5, 4
3, 2, 1

In other words, you need to make a copy of your weights array and reverse it. In memory the first weights look like this:

1, 2, 3, 4, 5, 6, 7, 8, 9

The flipped kernel looks like this in memory, so it's simply the original kernel but in reverse order:

9, 8, 7, 6, 5, 4, 3, 2, 1

Then you make a new convolution layer using that reversed array. This is now your deconv layer.

I don't have Metal sample code to show you, but it's really no different from making a regular MPSCNNConvolution layer. You just have to reverse the weights for the layer.

edited Mar 26 '17 at 10:16

answered Mar 23 '17 at 19:00

Matthijs Hollemans

7,706
2
16
23

I would kill for an example – s1ddok Mar 24 '17 at 19:54
Added an example. – Matthijs Hollemans Mar 26 '17 at 10:16
I guess it is not fully correct. First of all, we have 3 dimensional weights. And deconvolution supposed is supposed to make image's width and height bigger, isn't it? – s1ddok Apr 12 '17 at 15:46
Well, 3-dimensional weights are just an array of 2-dimensional weights. ;-) I'm not sure about the deconvolution making the image larger. Maybe you can link to the deconvolution operation you have in mind -- the deconvolution in math/signal processing is different than the deconvolution in neural networks, for example. – Matthijs Hollemans Apr 13 '17 at 12:03
This answer is misleading\incorrect. Convolution when performed with GEMM is GEMM(W,im2col(X)) whereas deconvolution/transposed convolution is the gradient i.e. the reverse: col2im(GEMM(W^t, X)). The transpose comes from the full weight matrix transpose. – twerdster May 13 '17 at 07:28
It's easy to verify whether my answer is correct or not: run the following script: https://gist.github.com/hollance/c3762f3ae59238c74b98a0ff335cd30a It computes the forward and backward passes of convolution. It also computes the forward pass of convolution with the weights flipped. The answers to the backward pass and the weights-flipped forward pass are identical. – Matthijs Hollemans May 13 '17 at 20:51
Nice! We can get stride working by injecting 0's in between kernel values, and we can get padding working by careful padding our input. – Teddy May 23 '17 at 03:44

score 1 · Accepted Answer · answered Aug 15 '17 at 21:57

1

MPS now provides MPSCNNConvolutionTranspose in macOS X.13 and tvOS/iOS 11.

answered Aug 15 '17 at 21:57

Ian Ollmann

1,592
9
16

Is there some sample code on how to use it properly? I get incorrect results when I try and use it. – nnrales Mar 17 '19 at 19:05

Deconvolution with Metal Performance Shaders

3 Answers3