How to select by index id in dataframe based on multiple conditions

Asked Dec 23 '21 at 21:33

Active Dec 24 '21 at 08:04

Viewed 33 times

I have a Pandas data frame with the following columns : [ID | Group | Account]. The dataset contains about 5-6 million lines, and I'm trying to arrange that data to do some ML later.

Because the repartition of the data is exponential, meaning the number of accounts (about 500 classes) per group (about 80 classes) is linear when applying a logarithm to the result, I'd like to even out the repartition of the data.

How could I randomly select a number of accounts with a ceiling value, with the certitude that every group has been taken at least a few times for every account ?

I have tried various techniques, but I can't find one that's suitable enough to my problem.

edited Dec 24 '21 at 08:04

desertnaut

57,590
26
140
166

asked Dec 23 '21 at 21:33

Ileocho

1

Does this answer your question? https://stackoverflow.com/a/15315507/7375347 – tax evader Dec 23 '21 at 21:44
2

Hi and welcome on SO. It will be great if you can have a look at [ask] and then try to produce a [mcve]. – rpanai Dec 23 '21 at 21:44
1

Yes it does @taxevader ! Thank you so mcuh! – Ileocho Dec 23 '21 at 21:53

How to select by index id in dataframe based on multiple conditions

0 Answers0