How can I resolve imbalanced datasets for AutoML classification on GCP?

Question

I am planning to use AutoML for the classification of my tabular data. But there is a moderate imbalance in the target variable. When running my own model, I would either upsample, downsample or build synthetic samples to resolve the imbalance. Is there such a possibility on AutoML on GCP? If not, how can one resolve such cases? Auto ML Tabular Data Classification

Just to clarify, what you meant by **"When running my own model, I would either upsample, downsample or build synthetic samples to resolve the imbalance. "** is that you adjust your data (add or remove samples) to resolve the imbalance then start training the model? — Ricco D, Jan 21 '22 at 06:35

score 0 · Answer 1 · answered Jan 24 '22 at 02:14

AutoML Tables is a supervised learning service. This means that you train a machine learning model with example data. In general, the more training examples you have, the better your outcome. The amount of example data required also scales with the complexity of the problem you're trying to solve. See guide on number of data to use.

So with regards to the imbalance in your dataset, the only way to resolve this case is to adjust the data (add or remove samples) for you to achieve optimal results.

For more information you can refer to AutoML Tables guide.

How can I resolve imbalanced datasets for AutoML classification on GCP?

1 Answers1