Does a ML model classify between desired image classes or by datasets?

Question

If I had a Dataset 1 with 90% cat images and 10% dog images, and I combined Dataset 2, with only dogs to equalize the class imbalance, will my model classify which are cats and dogs or which are dataset 1 images and dataset 2 images?

If it's the latter, how do I get the model to classify between cats and dogs?

I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in the `machine-learning` [tag info](https://stackoverflow.com/tags/machine-learning/info). — desertnaut, Apr 22 '21 at 09:04

score 0 · Answer 1 · answered Apr 22 '21 at 08:53

Your model will only do what it is trained for, regardless of what name your dataset(s) have.

Name of the dataset is just an organizational issue which does not go into training, does not really effect the amount of loss that will be produced during a training step. What will effect your models responses is however is the properties of the data.

Sometimes data from different datasets have different properties even though the datasets serve for the same purpose; like images with different illumination, background, resolution etc. That surely have an effect on the model performance. This is why mixing datasets should be performed with caution. You might find it useful to have a look at this paper.

Does a ML model classify between desired image classes or by datasets?

1 Answers1