I am using Tensorflow as a backend to Keras and I am trying to understand how to bring in my labels for image segmentation training.
I am using the LFW Parts Dataset which has both the ground truth image and the ground truth mask which looks like this * 1500 training images:
As I understand the process, during training, I load both the
- (X) Image
- (Y) Mask Image
Doing this in batches to meet my needs. Now my question is, is it sufficient to just load them both (Image and Mask Image) as NumPy arrays (N, N, 3) or do I need to process/reshape the Mask image in some way. Effectively, the mask/labels are represented as [R, G, B] pixels where:
- [255, 0, 0] Hair
- [0, 255, 0] Face
- [0, 0, 255] Background
I could do something like this to normalize it to 0-1, I don't know if I should though:
im = Image.open(path)
label = np.array(im, dtype=np.uint8)
label = np.multiply(label, 1.0/255)
so I end up with:
- [1, 0, 0] Hair
- [0, 1, 0] Face
- [0, 0, 1] Background
Everything I found online uses existing datasets in tensorflow or keras. Nothing is really all that clear on how to pull things off if you have what could be a considered a custom dataset.
I found this related to Caffe: https://groups.google.com/forum/#!topic/caffe-users/9qNggEa8EaQ
And they advocate for converting the mask images to a (H, W, 1)
(HWC) ?where my classes would be 0, 1 ,2
for Background, Hair, and Face respectively.
It may be that this is a duplicate here (combination of similar quesiton/answers):
How to implement multi-class semantic segmentation?
Tensorflow: How to create a Pascal VOC style image
I found one example that processes PascalVOC into (N, N, 1) that I adapted:
LFW_PARTS_PALETTE = {
(0, 0, 255) : 0 , # background (blue)
(255, 0, 0) : 1 , # hair (red)
(0, 0, 255) : 2 , # face (green)
}
def convert_from_color_segmentation(arr_3d):
arr_2d = np.zeros((arr_3d.shape[0], arr_3d.shape[1]), dtype=np.uint8)
palette = LFW_PARTS_PALETTE
for i in range(0, arr_3d.shape[0]):
for j in range(0, arr_3d.shape[1]):
key = (arr_3d[i, j, 0], arr_3d[i, j, 1], arr_3d[i, j, 2])
arr_2d[i, j] = palette.get(key, 0) # default value if key was not found is 0
return arr_2d
I think this might be close to what I want but not spot on. I think I need it to be (N, N, 3) since I have 3 classes? The above version and there is another one originated from these 2 locations:
https://github.com/martinkersner/train-CRF-RNN/blob/master/utils.py#L50
https://github.com/DrSleep/tensorflow-deeplab-resnet/blob/ce75c97fc1337a676e32214ba74865e55adc362c/deeplab_resnet/utils.py#L41 (this link one-hot's the values)