Why my feature extraction script fails?

Question

I have used the following script , to extract features vectors of my dataset using AlexNet pretrained model.

I extract feature vectors from fully connected layer 7 and save it on HICKLE format.

The files that are used in the code below :

-train.txt contains the path of 5000 images

-val.txt contains the path of 1000 images

After few iterations , I got this error error == cudaSuccess (2 vs. 0) out of memory . I know that my gpu is out memory . Would you help me to correct this issue ?

import numpy as np
import hickle as hkl
import caffe


caffe.set_mode_gpu()

def feature_extract(img):

    model_file='/home/jaba/caffe/data/diota_model/feature_extractor/bvlc_reference_caffenet.caffemodel'
    deploy_file='/home/jaba/caffe/data/diota_model/feature_extractor/alex.deployprototxt'

    net=caffe.Net(deploy_file,model_file,caffe.TEST)

    mean_values=np.array([103.939, 116.779, 123.68])

    #setting the transformer

    transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
    transformer.set_mean('data', mean_values ) #subtract by mean
    transformer.set_transpose('data', (2,0,1)) #(H,W,C) => (C,H,W)
    transformer.set_raw_scale('data', 255.0)   #[0.0, 1.0] => [0.0, 255.0]
    transformer.set_channel_swap("data", (2,1,0)) # RGB => BGR

    #img = caffe.io.load_image(img)

    net.blobs['data'].data[...] = transformer.preprocess('data', img)

    output=net.forward()

    feat=net.blobs['fc7'].data.copy()

    return feat





def create_dataset(datalist,db_prefix):
    with open(datalist) as fr:
            lines = fr.readlines()
    lines = [line.rstrip() for line in lines]

    feats = []
    labels = []

    for line_i, line in enumerate(lines):

        a=len(line)
        label=line[a-1]
        img_path=line[0:a-2]
            img = caffe.io.load_image(img_path)
            feat = feature_extract(img)
            feats.append(feat)
            label = int(label)
            labels.append(label)
            if (line_i + 1) % 100 == 0:
                    print "processed", line_i + 1


    feats = np.asarray(feats)
    labels = np.asarray(labels)


    hkl.dump(feats, dbprefix + "_features.hkl", mode="w")
    hkl.dump(labels, dbprefix + "_labels.hkl", mode="w")




create_dataset('train.txt','vgg_fc7_train')
create_dataset('val.txt','vgg_fc7_test')

for the deploy file :

name: "AlexNet"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 1 dim: 3 dim: 227 dim: 227 } }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc8"
  top: "prob"
}

How familiar are you with hardware? As the error says, you're "running out of memory". In other words, you're attempting to load more data than your hardware (GPU, in this case, I think) can handle — Paul H, May 31 '17 at 14:56
without a reproducible example, I'll say buy beefier hardware or load less data. — Paul H, May 31 '17 at 15:09
Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. We should be able to paste your posted code into a text file and reproduce the problem you described. — Prune, May 31 '17 at 17:29
it would be helpful if you could share your deploy.protoxt,but just trying to guess,you need to reduce your batch size by changing input_param { shape: { dim: 10 dim: 3 dim: 227 dim: 227 } } to input_param { shape: { dim: 1 dim: 3 dim: 227 dim: 227 } },the first dim specify the number of images,so you need to reduce it until it fits your GPU capacity — Eliethesaiyan, Jun 01 '17 at 03:09
@Eliethesaiyan I have executed the script using input_param { shape: { dim: 1 dim: 3 dim: 227 dim: 227 } } — user7417788, Jun 01 '17 at 09:21
No , the same problem . If I share the output of nvidia-smi command would it be more helpful ? — user7417788, Jun 01 '17 at 10:04

score 0 · Accepted Answer · answered Jun 05 '17 at 13:29

0

Reduce your batch size. That is the simplest fix in this condition. You may keep a minimum ```batch size`` of 1. Go on increasing it to find the limit for your GPU. You have to live with that figure with the current GPU.

answered Jun 05 '17 at 13:29

Harsh Wardhan

2,110
10
36
51

I have used 1 as batch size in my deploy.prototxt `(input_param { shape: { dim: 1 dim: 3 dim: 227 dim: 227 } } ` . But , I always have the same problem . – user7417788 Jun 06 '17 at 11:56
In that case either use the CPU mode or get another GPU. – Harsh Wardhan Jun 06 '17 at 12:46
I changed the caffe.set_mode_gpu() to caffe.set_mode_cpu() . But I have this error F0607 15:33:23.451539 3637 cudnn_relu_layer.cpp:13] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR – user7417788 Jun 07 '17 at 13:41

Why my feature extraction script fails?

1 Answers1