0

I just started to work on the Pascal VOC segmentation dataset. But I have trouble understanding the colour coding they have used in the ground labeling. I assumed pixels would be annotated 1 through 20 for each class but what I have got are 8 bit deep png images with pixel values (0-255).

For a certain pixel belonging to aeroplane class in 2007_000033.png, I get the values: (128, 0, 0); while another pixel belonging to train class in 2007_000123.png, gives the values : (128, 0, 192) and so on.

How do I differentiate them in different classes and do a one-hot encoding? Do I need to specify pixel values for each class (like searching pixels with (128, 0, 0) and encode them as 1 for class aeroplane)?

Sorry, I see a few similar questions on SO but nothing helped me. Thanks.

Anakin
  • 1,889
  • 1
  • 13
  • 27
  • You should probably check the class of each pixel and map it to a certain colour. Check this out https://gist.github.com/wllhf/a4533e0adebe57e3ed06d4b50c8419ae. – MattSt May 09 '18 at 15:14
  • Thanks @MattSt. This should be okay. – Anakin May 09 '18 at 15:16

1 Answers1

2

I raised myself a similar question which confused me for quite sometime. And I think I found a possible explanation:

If you look at the file_download_and_convert_voc2012.sh, there are lines marked by "# Remove the colormap in the ground truth annotations". This part process the original SegmentationClass files and produce the raw segmented image files, which have each pixel value between 0 : 20. (If you may ask why, check this post: Python: Use PIL to load png file gives strange results)

captainst
  • 617
  • 1
  • 7
  • 20