I have the following problem, there is a classification problem. On the track 50,000 lines, on Y 60 labels. But the data is unbalanced (in one class, 35000 values, in the other 59 classes 15000 values, of which in some 30 values). If for example, that is, X (column_1, column_2, column_3) and Y:
colum_1 colum_2 colum_3 Y
0.5 1 2 1
0.5 1.1 2 1
0.55 0.95 3 1
0.1 1 2 2
2 0.9 3 3
And need to add "noisy" data, so that there is no imbalance, conditionally, that all values become the same:
colum_1 colum_2 colum_3 Y
0.5 1 2 1
0.5 1.1 2 1
0.55 0.95 3 1
0.1 1 2 2
0.15 0.99 2 2
0.05 1.01 2 2
2 0.9 3 3
1.95 0.95 3 3
2.05 0.85 3 3
Only this is a toy example, but I have many meanings.