How does masks and images work with each other in UNET?

Question

Let's say , we have a 1000 number of images with their corresponding masks .Correct me if I am wrong that if we use UNET then it will pass through a number of different convolutional layers , relu , pooling etc. . It will learn the features of images according to its corresponding masks . It will give the label to objects and then it learns the features of images we pass in its training . It will match the object of image with its corresponding mask to learn the object features only not unnecessary objects features . Like if we pass the image of cat and its background is filled with some unnecessary obstacles(bins , table , chair etc. ) According to the mask of cat , it will learn the features of cats only . Kindly elaborate your answer if I am wrong ?

Yeap. U-Net is a learning-based neural segmentation algorithm, you need a labeled training set to make this work. It predicts correct class labels for each pixel in the input image. It is a learning-based system and segments based on what you teach it! Also you may not even need 1000 images. UNET uses augmentation during training. This, allows to learn from very few examples, since many variations of these examples are presented to the net during training — cosa__, Feb 09 '21 at 12:49
I’m voting to close this question because I don't see a single line of code here. — nbro, Feb 11 '21 at 20:15

score 0 · Answer 1 · answered Feb 09 '21 at 12:41

Yes, you are right.

However not only UNET every segmentation algorithm works in the same way that it will learn to detect the features that are masked and ignoring unnecessary objects(as you mentioned).

By the way, people typically choose Fast RCNN, Yolo than UNET for multiclass segmentation for real world objects (like chair, table, cat, cars, etc).

score 0 · Answer 2 · answered Feb 09 '21 at 12:52

so here is a short explanation (but not limited to). 1- All the segmentation network or let's say task (in a more general term), uses the actual image and ground truth (your masks) to learn a classification task.

Is it really a classification task like logistics regression or decision tree? (then why the hell such a complex name).

Ans: Cool, intrinsically YES, Your network is learning to classify. But it's a bit different than your decision tree or logistics.

So our network like UNET tries to learn, how to classify each pixel in the image. And this learning is completely supervised, as you have a ground truth (masks), which tells the network, which class a pixel in the image belongs to. Hence, when you do the training the network weights (weights of all your conv layers and blah blah...) are adjusted such that it learns to classify each pixel in the image to its corresponding classes.

How does masks and images work with each other in UNET?

2 Answers2