What is the difference between SeparableConv2D and Conv2D layers?

Question

I didn't find a clearly answer to this question online (sorry if it exists). I would like to understand the differences between the two functions (SeparableConv2D and Conv2D), step by step with, for example a input dataset of (3,3,3) (as RGB image).

Running this script based on Keras-Tensorflow :

import numpy as np
from keras.layers import Conv2D, SeparableConv2D
from keras.models import Model
from keras.layers import Input

red   = np.array([1]*9).reshape((3,3))
green = np.array([100]*9).reshape((3,3))
blue  = np.array([10000]*9).reshape((3,3))

img = np.stack([red, green, blue], axis=-1)
img = np.expand_dims(img, axis=0)

inputs = Input((3,3,3))
conv1 = SeparableConv2D(filters=1, 
              strides=1, 
              padding='valid', 
              activation='relu',
              kernel_size=2, 
              depth_multiplier=1,
              depthwise_initializer='ones',
              pointwise_initializer='ones',
              bias_initializer='zeros')(inputs)

conv2 = Conv2D(filters=1, 
              strides=1, 
              padding='valid', 
              activation='relu',
              kernel_size=2, 
              kernel_initializer='ones', 
              bias_initializer='zeros')(inputs)

model1 = Model(inputs,conv1)
model2 = Model(inputs,conv2)
print("Model 1 prediction: ")
print(model1.predict(img))
print("Model 2 prediction: ")
print(model2.predict(img))
print("Model 1 summary: ")
model1.summary()
print("Model 2 summary: ")
model2.summary()

I have the following output :

Model 1 prediction:
 [[[[40404.]
   [40404.]]
  [[40404.]
   [40404.]]]]
Model 2 prediction: 
[[[[40404.]
   [40404.]]
  [[40404.]
   [40404.]]]]
Model 1 summary: 
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 3, 3, 3)           0         
_________________________________________________________________
separable_conv2d_1 (Separabl (None, 2, 2, 1)           16        
=================================================================
Total params: 16
Trainable params: 16
Non-trainable params: 0
_________________________________________________________________
Model 2 summary: 
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 3, 3, 3)           0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 2, 2, 1)           13        
=================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0

I understand how Keras compute the Conv2D prediction of model 2 thanks to this post, but can someone explains the SeperableConv2D computation of model 1 prediction please and its number of parameters (16) ?

score 28 · Accepted Answer · edited Nov 14 '21 at 02:21

28

As Keras uses Tensorflow, you can check in the Tensorflow's API the difference.

The conv2D is the traditional convolution. So, you have an image, with or without padding, and filter that slides through the image with a given stride.

On the other hand, the SeparableConv2D is a variation of the traditional convolution that was proposed to compute it faster. It performs a depthwise spatial convolution followed by a pointwise convolution which mixes together the resulting output channels. MobileNet, for example, uses this operation to compute the convolutions faster.

I could explain both operations here, however, this post has a very good explanation using images and videos that I strongly recommend you to read.

edited Nov 14 '21 at 02:21

Nimantha

6,405
6
28
69

answered Feb 15 '19 at 12:05

André Pacheco

1,780
14
19

1

Thank you for sharing that, it helps understanding the SeparableConv2D. I came to see I did not initialize the depthwise and pointwise part for my separable convolution, so I get different predictions... Sorry – etiennedm Feb 15 '19 at 12:52
Hope you can enlighten me! I read several guidances and they say SeparableConv2D is mostly used for training to work on small devices like cameras, robots,... since they have limitations in hardware. So what if they use SeparableConv2D instead of traditional Conv2D in all models? Is the traditional way still have the pros? Tks – I'mMotivated Jul 22 '21 at 16:30

What is the difference between SeparableConv2D and Conv2D layers?

1 Answers1

Linked