0

I'm fine-tuning VGG-16 for my task. The idea is that I load the pretrained weights, remove the last layer (which is softmax with 1000 outputs) and replace it with a softmax with a few outputs. Then I freeze all the layers but the last and train the model.

Here is the code that builds the original model and loads the weights.

def VGG_16(weights_path=None):
    model = Sequential()
    model.add(ZeroPadding2D((1,1),input_shape=(224,224,3)))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(256, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(256, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(256, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(512, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(512, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(512, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(512, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(512, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(512, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(Flatten())
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1000, activation='softmax'))

    if weights_path:
        model.load_weights(weights_path)

    return model

Keras uses Tensorflow as a backend in my case. Tensorflow is built to use GPU (using CUDA). I currently have a rather old card: GTX 760 with 2Gb of memory.

On my card I cannot even load the whole model (the code above) because of an out of memory error.

Here the author says that 4Gb is not enough as well.

Here GTX 1070 is able to even train VGG-16 (not just load it into memory), but only with some batch sizes and in different frameworks (not in Keras). It seems that GTX 1070 always have exactly 8Gb of memory.

So it seems that 4Gb is clearly not enough for fine-tuning VGG-16, and 8Gb may be enough.

And the question is: what amount of memory is enough to finetune VGG-16 with Keras+TF? Will 6Gb be enough, or 8Gb is minimum and ok, or something bigger is needed?

stop-cran
  • 4,229
  • 2
  • 30
  • 47
Roman Puchkovskiy
  • 11,415
  • 5
  • 36
  • 72
  • Have you tried using the `VGG16` model available in [Keras applications](https://keras.io/applications/#vgg16)? My GPU is 740M and has 2GB of memory, but I can load the model (of course, with `include_top=False`). If you don't need the last layer, this approach would be better since it does not load it at all and hence no need to remove it later (as you know the last layer is huge!). – today Jun 17 '18 at 19:12
  • Further, in the past I have been able to fine tune it using a `Dense` layer of 1 unit as the last layer. – today Jun 17 '18 at 19:18
  • @today thank you very much, I've just tried this approach with predictions and it is able to predict on GTX 760, with all FC layers, of course! I will try to fine-tune without the top layers. – Roman Puchkovskiy Jun 18 '18 at 19:00

1 Answers1

1

I have finetuned VGG-16 in Tensorflow with a batch size of 32 (GPU: 8GB). I think this would be the same for your case as Keras uses Tensorflow. However, if you want to train with a larger batch size then you might need 12 or 16 GB GPU.

ravi teja
  • 76
  • 4