0

I use Segmentation Models library for multi-class (in my case 4 class) semantic segmentation. The model (UNet with 'resnet34' backbone) is trained with 3000 RGB (224x224x3) images. The accuracy is around 92.80%.

1) Why model.predict() function requires (1,224,224,3) shaped array as input ? I didn't find the answer even in the Keras documentation. Actually, below code is working, I have no problem with it but I want to understand the reason.

predictions = model.predict( test_image.reshape(-1,224,224,3) );

2) predictions is a (1,224,224,3) shaped numpy array. Its data type is float32 and contains some floating numbers. What is the meaning of the numbers inside this array? How can I visualize them? I mean, I assumed that the result array will contain one of 4 class label (from 0 to 3) for every pixel, and then I will apply the color map for each class. In other words, the result should have been a prediction map, but I didn't get it. To understand better what I mean about prediction map, please visit the Jeremy Jordan's blog about semantic segmentation.

result = predictions[0]
plt.imshow(result)  # import matplotlib.pyplot as plt

3) What I finally want to do is like Github: mrgloom - Semantic Segmentation Categorical Crossentropy Example did in visualy_inspect_result function.

saki
  • 117
  • 3
  • 8

1 Answers1

1

1) Image input shape in your deep neural network architecture is (224,224,3), so width=height=224 and 3 color channels. And you need an additionnal dimension in case you want to give more than one image at a time to your model. So (1,224,224,3) or (something, 224,224,3).

2) According to the doc of Segementation models repo, you can specify the number of classes you want as output model = Unet('resnet34', classes=4, activation='softmax'). Thus if you reshape your labelled image to have a shape (1,224,224,4). The last dimension is a mask channel indicating with a 0 or 1 if pixel i,j belongs to class k. Then you can predict and access to each output mask

masked = model.predict(np.array([im])[0]
mask_class0 = masked[:,:,0]
mask_class1 = masked[:,:,1]

3) Then using matplotlib you will be able to plot semantic segmentation or using scikit-image : color.label2rgb function

Simon Delecourt
  • 1,519
  • 8
  • 13
  • Thank you for the first answer. But I couldn't understand the second one. I uploaded the result image(224x224x3) of `model.predict()` [here](https://pasteboard.co/InDIukn.png). As you see there are two classes in the image, the circle ones and the road. But there is no clear distinction between these classes in terms of floating point numbers. I would like to understand which pixel belongs to which class. – saki Jul 12 '19 at 13:47
  • I edited my answer accordingly. Please upvote and accept the answer and post a new question if you still encounter difficulties. – Simon Delecourt Jul 12 '19 at 14:01
  • I am sorry but I can not upvote, here the stackoverflow's warning: "Votes cast by those with less than 15 reputation are recorded, but do not change the publicly displayed post score.". I am still confused and did not solve the problem yet. What do you mean "Thus if you reshape **your labelled image** to have a shape (1,224,224,4)."? If you mean that the **labelled image** is `result` in my code which is the output of the `model.predict()` function, it can not be converted to (1,224,224,4) since its shape is (224x224x3). – saki Jul 12 '19 at 15:51
  • As I said if you have 4 classes you should have a model that output of (1,224,224,4). The ouput are 0-1 mask. Mask0 correspond to class0 so if pixel i,j equals 1 it belongs to class0. And so on for every mask of the ouput – Simon Delecourt Jul 12 '19 at 15:55
  • topic does not properly match the the asked question. Please modify that. – Max Dec 08 '20 at 19:47