1

I've been doing a single label regression problem in Caffe. The input contains 5 hdf5 files which I have generated independently using different images. I first tested my network with a single hdf5 file and ran 10000 iterations with about 800 training images(batch size 64). Finally, when I did prediction on the same training images, I got the result as follows:

enter image description here

But on the testing images it was:

enter image description here

Which as far as I understand is due to the less amount of training data and that the test data is not quite similar to training data.

So,I tried increasing the training data to about 5500 images, dividing them into 5 hdf5 files. And the prediction output on the training data using a model created using 14,000 iterations is:

enter image description here

I do not understand why the prediction is worse? How does caffe select a batch? (my batch size is 64) Does it select a batch at random from the 5 hdf5 files? What might be the reason behind my bad prediction? And what can I do to train my model effectively? Should I add more convolutional layers? Any suggestions will be extremely life-saving. This is my first attempt in neural networks and caffe. My network is:

name: "Regression"
layer{
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "train_hdf5file.txt"
    batch_size: 64
    shuffle: true
  }
  include: { phase: TRAIN }
}
layer{
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "test_hdf5file.txt"
    batch_size: 30
  }
  include: { phase: TEST }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param { lr_mult: 1 }
  param { lr_mult: 2 }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "dropout1"
  type: "Dropout"
  bottom: "pool1"
  top: "pool1"
  dropout_param {
    dropout_ratio: 0.1
  }
}

layer{
  name: "fc1"
  type: "InnerProduct"
  bottom: "pool1"
  top: "fc1"
  param { lr_mult: 1 decay_mult: 1 }
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "dropout2"
  type: "Dropout"
  bottom: "fc1"
  top: "fc1"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer{
  name: "fc2"
  type: "InnerProduct"
  bottom: "fc1"
  top: "fc2"
  param { lr_mult: 1 decay_mult: 1 }
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
 }
}
layer{
  name: "loss"
  type: "EuclideanLoss"
  bottom: "fc2"
  bottom: "label"
  top: "loss"
}
magneto
  • 97
  • 6
  • 1
    I'm not that familiar with caffe and the network-representation. But the only kind of regulization i see is from the dropout-layers. Maybe add some l1/l2 regulization to your weights. I hope, that the concept of regulization is clear to you, as it's very very important in ML. (Without any regulization, a network powerful/big enough will give you near-perfect training-score, but it's mostly memorizing the data and there is not guarantee at all what will happen with other data like your test-data.) – sascha Oct 07 '16 at 15:42
  • @sascha thank you for your reply. In my case, the result is not overfitting, the prediction on the training data itself is not good enough. Actually, I am having doubts on the amount of training data I'm using or the way my data is being used and if the network structure is good enough with just a single convolutional layer. Also how caffe deals with the multiple hdfs files and how it selects a batch from them. I want to know if I should increase my data amount on the same network or improve my network first before increasing my data. – magneto Oct 07 '16 at 16:09

1 Answers1

1

Try adding convolutional layers, and remove the dropout (then you can use it if you are having overfitting problems). Additionally you have to check the loss that is printed by Caffe during training; based on that you might need to change also the learning rate, etc, in the solver file.

Roger Trullo
  • 1,436
  • 2
  • 10
  • 19