Is it possible to shuffle a dataframe while using while grouping by index in pandas or sklearn?

Question

I have dataframe df, containing patient data, as shown below:

| patient_id    | x     | y     | path  | target    |
|------------   |-----  |-----  |------ |--------   |
| 4423          | 234   | 53    | ....  | 1         |
| 4423          | 259   | 68    | ....  | 0         |
| ...           | ...   | ...   | ...   | ...       |
| 3351          | 100   | 34    | ....  | 1         |
| 3351          | 150   | 78    | ....  | 1         |

What I would like to do is shuffle the data while maintaining the patient_id order. In other words, I want to df.groupby('patient_id') and then shuffle my data.

Is there a way to achieve this using pandas or sklearn?

Does https://stackoverflow.com/questions/45585860/shuffle-a-pandas-dataframe-by-groups helps? — Pygirl, Feb 24 '20 at 13:00
Yes, I'm reading it right now, trying to wrap my mind around it. :D — A Merii, Feb 24 '20 at 13:01
@Pygirl I implemented the solution and it worked fine, but I don't understand what happened here: `[df for _, df in df.groupby('sampleID')]` , could you perhaps break it down for me? I understand list comprehension but I have never seen it used in this way before. — A Merii, Feb 24 '20 at 13:16
We are shuffling subset of dataframes created using groupby. https://ctxt.io/2/AABANXVKEg — Pygirl, Feb 24 '20 at 13:23
Yes, I understand that, but I don't understand the syntax that is used. — A Merii, Feb 24 '20 at 13:26
Is it the same as using `[df for df in df.groupby('sampleID')]`? — A Merii, Feb 24 '20 at 13:27
Nope it will give you index too, like if your groupby creates 3 chunks then index will be from 1 to 3. It's like keeping the note of chunks of dataframe created. — Pygirl, Feb 24 '20 at 13:32

Is it possible to shuffle a dataframe while using while grouping by index in pandas or sklearn?

0 Answers0