Depth Estimation using Keras

Question

I'm trying to design a Convolutional Net to estimate the Depth of images using Keras.

I have RGB Input images with the shape of 3x120x160 and have the Grayscale Output Depth Maps with the shape of 1x120x160.

I tried using a VGG like architecture where the depth of each layer grows but at the end when I want to design the final layers, I get stuck. using a Dense layer is too expensive and I tried using Upsampling which proved inefficient.

I want to use DeConvolution2D but I can't get it to work. the only architecture I end up is something like this:

    model = Sequential()
    model.add(Convolution2D(64, 5, 5, activation='relu', input_shape=(3, 120, 160)))
    model.add(Convolution2D(64, 5, 5, activation='relu'))
    model.add(MaxPooling2D())
    model.add(Dropout(0.5))

    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(MaxPooling2D())
    model.add(Dropout(0.5))

    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(Dropout(0.5))

    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(Dropout(0.5))

    model.add(ZeroPadding2D())
    model.add(Deconvolution2D(512, 3, 3, (None, 512, 41, 61), subsample=(2, 2), activation='relu'))
    model.add(Deconvolution2D(512, 3, 3, (None, 512, 123, 183), subsample=(3, 3), activation='relu'))
    model.add(cropping.Cropping2D(cropping=((1, 2), (11, 12))))
    model.add(Convolution2D(1, 1, 1, activation='sigmoid', border_mode='same'))

The Model summary is like this :

Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution2d_1 (Convolution2D)  (None, 64, 116, 156)  4864        convolution2d_input_1[0][0]      
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D)  (None, 64, 112, 152)  102464      convolution2d_1[0][0]            
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)    (None, 64, 56, 76)    0           convolution2d_2[0][0]            
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 64, 56, 76)    0           maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D)  (None, 128, 54, 74)   73856       dropout_1[0][0]                  
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D)  (None, 128, 52, 72)   147584      convolution2d_3[0][0]            
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D)    (None, 128, 26, 36)   0           convolution2d_4[0][0]            
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 128, 26, 36)   0           maxpooling2d_2[0][0]             
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D)  (None, 256, 24, 34)   295168      dropout_2[0][0]                  
____________________________________________________________________________________________________
convolution2d_6 (Convolution2D)  (None, 256, 22, 32)   590080      convolution2d_5[0][0]            
____________________________________________________________________________________________________
dropout_3 (Dropout)              (None, 256, 22, 32)   0           convolution2d_6[0][0]            
____________________________________________________________________________________________________
convolution2d_7 (Convolution2D)  (None, 512, 20, 30)   1180160     dropout_3[0][0]                  
____________________________________________________________________________________________________
convolution2d_8 (Convolution2D)  (None, 512, 18, 28)   2359808     convolution2d_7[0][0]            
____________________________________________________________________________________________________
dropout_4 (Dropout)              (None, 512, 18, 28)   0           convolution2d_8[0][0]            
____________________________________________________________________________________________________
zeropadding2d_1 (ZeroPadding2D)  (None, 512, 20, 30)   0           dropout_4[0][0]                  
____________________________________________________________________________________________________
deconvolution2d_1 (Deconvolution2(None, 512, 41, 61)   2359808     zeropadding2d_1[0][0]            
____________________________________________________________________________________________________
deconvolution2d_2 (Deconvolution2(None, 512, 123, 183) 2359808     deconvolution2d_1[0][0]          
____________________________________________________________________________________________________
cropping2d_1 (Cropping2D)        (None, 512, 120, 160) 0           deconvolution2d_2[0][0]          
____________________________________________________________________________________________________
convolution2d_9 (Convolution2D)  (None, 1, 120, 160)   513         cropping2d_1[0][0]               
====================================================================================================
Total params: 9474113

I couldn't reduce the size of Deconvolution2D layers from 512 as doing so results in shape related errors and it seems I have to add as many Deconvolution2D layers as the number of filters in the previous layer. I also had to add a final Convolution2D layer to be able to run the network.

The above architecture learns but really slow and (I think) inefficiently. I'm sure I'm doing something wrong and the design shouldn't be like this. Can you help me design a better network?

I also tried to make a network as the one mentioned in this repository but it seems Keras doesn't work as this Lasagne example does. I'd really appreciate it if someone could show me how to design something like this network in Keras. It's architecture is like this :

Thanks

score 1 · Answer 1 · answered Dec 21 '17 at 06:37

I'd suggest a U-Net (see figure 1). In the first half of a U-Net, the spatial resolution gets reduced as the number of channels increases (like VGG, as you mentioned). In the second half, the opposite happens, (number of channels get reduced, resolution increases). "Skip" connections between different layers allow for the network to efficiently produce high-resolution output.

You should be able to find an appropriate Keras implementation (maybe this one).

Depth Estimation using Keras

1 Answers1