Dealing with multi-class problem. Can Random Forest Classifier handle >100,000 classes?

Question

I need to create a Recommender System to be able to classify >100,000 unique classes.

Can anyone tell me if Random Forest Classifier can handle this problem?

As far as I understood thru numerous articles on this topic, people keep saying that maximum classes they were able to classify with RFC was 100-200.

Is there a way to bypass this issue with RFC and how it will affect the accuracy?

If not, what ML algo would you suggest me to follow?

Thank you in advance!

score 2 · Accepted Answer · answered May 05 '23 at 07:00

Beyond the problem mentioned, it is not a good idea to have a single model that classifies 100k classes. It's like having a translator who knows all the languages. It is preferable to have as many translators as language pairs. Is it the same for you. A first model that classifies large groups

Assumes the tree of life and a model capable of classifying all living species.

Do you think it makes sense to create this kind of model? Perhaps it is better to have a model which classifies by major branches, then sub-models specialized in the classification of minor branches and finally models which define the final species (the leaves of the tree).

The development work will probably take longer but the results will be better. You are not going to ask an ornithologist to classify the species of a fish but rather an ichthyologist :-)

As you can see, you can use several random forest classifiers but specialized in one part of the job. I hope my explanations have been clear even though my answer does not provide usable code.

Corralien, thanks a lot for your help! Indeed, your answer shed light on my further steps. So I should try to think how to group data, to label each group and how to minimize number of labels used for classification where on top I have general groups where each group has to be disassembled into smaller ones. — Fitzpatrick, May 05 '23 at 07:48
Yes that's exactly it. Another point. If you have a lot of features, try to reduce the dimensions with a PCA because for the major categories, not all the features will be useful to separate the groups. I think you can also do clustering for the major categories and only use RFC at the last step. — Corralien, May 05 '23 at 08:00
Corralien, you made my day! :) Honestly. Thank you for your immense help. — Fitzpatrick, May 05 '23 at 11:12

Dealing with multi-class problem. Can Random Forest Classifier handle >100,000 classes?

1 Answers1