Computer Vision: Split dataset into several classes and combine them during inference

Question

Situation: We are training a Neural Network that distinguishes between opened and closed eyes. While collecting the data we didn't simply collect opened and closed eyes but we also collected data for edgecases like eyes covered by the hand or arm and looking down very strongly what covers most of the eyes. However, we put all opened and all closed eyes together and trained a binary classification Neural Network, because that's all we care about during inference.

Question: We were wondering if it made sense to train a multiclass network. For example we would then have the classes opend, closed, covered by hand, no eye at all, looking down.. and combine all classes besides from closed to one class. We think that the model may be able to understand better the real world if we distinguish between all these cases but I want to emphazise that we don't need the information if the eye is covered by a hand - we only need to know if the eye is closed or not.

I tried to find research done on this topic but without success.

my intuition agrees with your intuition but I am not practiced enough to give a proper answer. two classes, it's like a lawyer asking you a yes/no question that really is more complicated than a simple yes/no. the network will likely formulate those distinctions on its own and learn to collapse/group them as requested, but it may struggle to classify those special cases as a simple "either/or". I would provide it the "out" of multiple classes, so it can confidently label those (expectedly common!) situations. you could try both ways (just retrain for the other way) and compare. — Christoph Rackwitz, Dec 12 '20 at 21:57
Your end goal is to build the best model isn't it? How do you decide a ML model is good or bad? We decide it by evaluating these models. So I will recommend you to prepare a robust test set (considering all possible real time scenarios) and compare your models and approaches by metrics (like accuracy). The model or approach which attains the best result is your best model. I hope this makes sense. — Devashish Prasad, Dec 22 '20 at 17:32

Computer Vision: Split dataset into several classes and combine them during inference

0 Answers0