0

I have dataframe df, containing patient data, as shown below:

| patient_id    | x     | y     | path  | target    |
|------------   |-----  |-----  |------ |--------   |
| 4423          | 234   | 53    | ....  | 1         |
| 4423          | 259   | 68    | ....  | 0         |
| ...           | ...   | ...   | ...   | ...       |
| 3351          | 100   | 34    | ....  | 1         |
| 3351          | 150   | 78    | ....  | 1         |

What I would like to do is shuffle the data while maintaining the patient_id order. In other words, I want to df.groupby('patient_id') and then shuffle my data.

Is there a way to achieve this using pandas or sklearn?

A Merii
  • 574
  • 9
  • 21
  • 1
    Does https://stackoverflow.com/questions/45585860/shuffle-a-pandas-dataframe-by-groups helps? – Pygirl Feb 24 '20 at 13:00
  • Yes, I'm reading it right now, trying to wrap my mind around it. :D – A Merii Feb 24 '20 at 13:01
  • @Pygirl I implemented the solution and it worked fine, but I don't understand what happened here: `[df for _, df in df.groupby('sampleID')]` , could you perhaps break it down for me? I understand list comprehension but I have never seen it used in this way before. – A Merii Feb 24 '20 at 13:16
  • 1
    We are shuffling subset of dataframes created using groupby. https://ctxt.io/2/AABANXVKEg – Pygirl Feb 24 '20 at 13:23
  • Yes, I understand that, but I don't understand the syntax that is used. – A Merii Feb 24 '20 at 13:26
  • Is it the same as using `[df for df in df.groupby('sampleID')]`? – A Merii Feb 24 '20 at 13:27
  • 1
    Nope it will give you index too, like if your groupby creates 3 chunks then index will be from 1 to 3. It's like keeping the note of chunks of dataframe created. – Pygirl Feb 24 '20 at 13:32

0 Answers0