Pascal VOC Class Segmentation: Ground-truth pixel labeling for training

Question

I just started to work on the Pascal VOC segmentation dataset. But I have trouble understanding the colour coding they have used in the ground labeling. I assumed pixels would be annotated 1 through 20 for each class but what I have got are 8 bit deep png images with pixel values (0-255).

For a certain pixel belonging to aeroplane class in 2007_000033.png, I get the values: (128, 0, 0); while another pixel belonging to train class in 2007_000123.png, gives the values : (128, 0, 192) and so on.

How do I differentiate them in different classes and do a one-hot encoding? Do I need to specify pixel values for each class (like searching pixels with (128, 0, 0) and encode them as 1 for class aeroplane)?

Sorry, I see a few similar questions on SO but nothing helped me. Thanks.

You should probably check the class of each pixel and map it to a certain colour. Check this out https://gist.github.com/wllhf/a4533e0adebe57e3ed06d4b50c8419ae. — MattSt, May 09 '18 at 15:14

score 2 · Answer 1 · answered Aug 04 '18 at 01:57

I raised myself a similar question which confused me for quite sometime. And I think I found a possible explanation:

If you look at the file_download_and_convert_voc2012.sh, there are lines marked by "# Remove the colormap in the ground truth annotations". This part process the original SegmentationClass files and produce the raw segmented image files, which have each pixel value between 0 : 20. (If you may ask why, check this post: Python: Use PIL to load png file gives strange results)

Pascal VOC Class Segmentation: Ground-truth pixel labeling for training

1 Answers1