Honestly, the only way to know is to load the model via
from tensorflow.keras.applications import MobileNet
model = MobileNet()
model.summary()
Indeed, when you check the results, the only present layer is DepthwiseConv2D
.
In fact, inspecting the model.summary()
yields us the following results:
(Note that this is a block of Depthwise + Pointwise)
conv_pad_6 (ZeroPadding2D) (None, 29, 29, 256) 0
_________________________________________________________________ conv_dw_6 (DepthwiseConv2D) (None, 14, 14, 256) 2304
_________________________________________________________________ conv_dw_6_bn (BatchNormaliza (None, 14, 14, 256) 1024
_________________________________________________________________ conv_dw_6_relu (ReLU) (None, 14, 14, 256) 0
_________________________________________________________________ conv_pw_6 (Conv2D) (None, 14, 14, 512) 131072
_________________________________________________________________ conv_pw_6_bn (BatchNormaliza (None, 14, 14, 512) 2048
_________________________________________________________________ conv_pw_6_relu (ReLU) (None, 14, 14, 512) 0
The first three layers perform depthwise separable convolution while pointwise convolution is performed by the last three layers. You can see from the name of the layers which layers are part of the first operation (dw
) and the second one (pw
).
By inspecting those layers we can also see the order of the operations, i.e. that the batch normalization
operation takes place before the relu activation
. This is valid for both the depthwise convolution and the pointwise convolution, as you can see in the description above.
However your observation is indeed good, since there is no 1x1 Convolution present in the architecture, at least as per model.summary()
From the Keras/TF documentation:
"""Depthwise separable 2D convolution.
Depthwise Separable convolutions consist of performing just the
first step in a depthwise spatial convolution (which acts on each
input channel separately).