I am not expert in caffe
and Python
, but I am trying to learn step by step. I am a little bit confused, so I would really appreciate if experts have a look on my questions.
I am working on image segmentation. I am trying to do on-the-fly
data augmentation by adding python layers. For my dataset I would like to do translation of (+10,-10) in both x-axis and y-axis (4 translations in addition), adding Gaussian noise, and horizontal flipping.
My questions are:
How does caffe synchronize the image with label? For example, if I am sending an image by
data
layer to the network and on the side,label
is sent to theSoftmaxWithLoss
(for example). I have drawn (manually) a schematic view of the augmentation and normal flow of data, and I am not sure how much my understanding is correct!As can be seen in the figure, for translation we have to translate image and ground truth in a synchronization manner (or for flipping, we have to flip label as well); for example, if I am shifting image by -10 and -10 pixels in x-axis and y-axis respectively, the ground truth image also needs to be relocated correspondingly. How this can be done in caffe Python layer. Is my understanding correct (based on the figure)? I have written the python layer as follows:
import caffe
import numpy as np
from skimage import transform as tf
from skimage.transform import AffineTransform
class ShiftLayer(caffe.Layer):
def setup(self,bottom,top):
assert len(bottom)==2, #requires two inputs bottom(1:image, 2:label)
assert len(top)==2 #requires two layer top
def reshape(self,bottom,top):
top[0].reshape(*bottom[0].data.shape) #HOW CAN WE KNOW LABEL or DATA is GOING TO "bottom[0]" or "bottom[1]"?????
top[1].reshape(*bottom[1].data.shape)
def forward(self,bottom,top):
x_trans=-10
y_trans=-10
top[0].data[...]=tf.warp(bottom[0].data, AffineTransform(translation=(x_trans,y_trans)))
top[1].data[...]=tf.warp(bottom[1].data, AffineTransform(translation=(x_trans,y_trans)))
def backward(self,top,propagate_down,bottom):
pass
And this the layer definition:
layer {
name: "shift_layer"
type: "Python"
bottom: "data"
bottom: "label"
top: "data"
top: "label"
include {
phase: TRAIN
}
python_param {
module: "myshift_layer"
layer: "ShiftLayer"
}
}
If I am adding other augmentation techniques to the network should I write separate modules for each of them? or can I write one single python layer including many
bottoms
and the correspondingtops
? If yes, How can I know which top is related to which bottom?In the case of Gaussian noise addition, we do have the same label as input image, how is the layer definition for this one?