11

I am using Tensorflow as a backend to Keras and I am trying to understand how to bring in my labels for image segmentation training.

I am using the LFW Parts Dataset which has both the ground truth image and the ground truth mask which looks like this * 1500 training images:

Aaron_Peirsol_0002_Image Aaron_Peirsol_0002_Mask

As I understand the process, during training, I load both the

  • (X) Image
  • (Y) Mask Image

Doing this in batches to meet my needs. Now my question is, is it sufficient to just load them both (Image and Mask Image) as NumPy arrays (N, N, 3) or do I need to process/reshape the Mask image in some way. Effectively, the mask/labels are represented as [R, G, B] pixels where:

  • [255, 0, 0] Hair
  • [0, 255, 0] Face
  • [0, 0, 255] Background

I could do something like this to normalize it to 0-1, I don't know if I should though:

im = Image.open(path)
label = np.array(im, dtype=np.uint8)
label = np.multiply(label, 1.0/255)

so I end up with:

  • [1, 0, 0] Hair
  • [0, 1, 0] Face
  • [0, 0, 1] Background

Everything I found online uses existing datasets in tensorflow or keras. Nothing is really all that clear on how to pull things off if you have what could be a considered a custom dataset.

I found this related to Caffe: https://groups.google.com/forum/#!topic/caffe-users/9qNggEa8EaQ

And they advocate for converting the mask images to a (H, W, 1) (HWC) ?where my classes would be 0, 1 ,2 for Background, Hair, and Face respectively.

It may be that this is a duplicate here (combination of similar quesiton/answers):

How to implement multi-class semantic segmentation?

Tensorflow: How to create a Pascal VOC style image

I found one example that processes PascalVOC into (N, N, 1) that I adapted:

LFW_PARTS_PALETTE = {
    (0, 0, 255) : 0 , # background (blue)
    (255, 0, 0) : 1 , # hair (red)
    (0, 0, 255) : 2 , # face (green)
}

def convert_from_color_segmentation(arr_3d):
    arr_2d = np.zeros((arr_3d.shape[0], arr_3d.shape[1]), dtype=np.uint8)
    palette = LFW_PARTS_PALETTE

    for i in range(0, arr_3d.shape[0]):
        for j in range(0, arr_3d.shape[1]):
            key = (arr_3d[i, j, 0], arr_3d[i, j, 1], arr_3d[i, j, 2])
            arr_2d[i, j] = palette.get(key, 0) # default value if key was not found is 0

    return arr_2d

I think this might be close to what I want but not spot on. I think I need it to be (N, N, 3) since I have 3 classes? The above version and there is another one originated from these 2 locations:

https://github.com/martinkersner/train-CRF-RNN/blob/master/utils.py#L50

https://github.com/DrSleep/tensorflow-deeplab-resnet/blob/ce75c97fc1337a676e32214ba74865e55adc362c/deeplab_resnet/utils.py#L41 (this link one-hot's the values)

AJ Venturella
  • 4,742
  • 4
  • 33
  • 62

3 Answers3

9

Since this is semantic segmentation, you are classifying each pixel in the image, so you would be using a cross-entropy loss most likely. Keras, as well as TensorFlow require that your mask is one hot encoded, and also, the output dimension of your mask should be something like [batch, height, width, num_classes] <- which you will have to reshape the same way as your mask before computing your cross-entropy mask, which essentially means that you would have to reshape your logits and mask to the tensor shape [-1, num_classes] where -1 denotes 'as many as required'.

Have a look here at the end

Since your question is about loading your own image, I just finished building an input pipeline for segmentation myself, it is in TensorFlow though, so I don't know if it helps you, have a look if you are interested: Tensorflow input pipeline for segmentation

Hasnain Raza
  • 681
  • 5
  • 10
0

Keras requires the label to be one-hot encoded. So your input will have to be of (N x N x n_classes) dimension.

shubhamgoel27
  • 1,391
  • 10
  • 17
0

I had the same problem and i came up with a pure Tensorflow Solution, which converts RGB values from a loaded mask image (128,128,3) Tensor for a 128x128 RGB image to a (128,128) Tensor, where the Tensor encodes the class in the interval [0...number_of_classes].. Please see my Blogpost: https://www.spacefish.biz/2020/11/rgb-segmentation-masks-to-classes-in-tensorflow/

You can also get a one hot encoded Tensor that way, like (128,128,number_of_classes) by just leaving out the last "tf.argmax" step.

Spacefish
  • 96
  • 3