I have a 300x300 df. Each row represents the data of a different patient, named with an id. The first 2 columns are the patients id´s. Column 'id_all' contains more participants than needed, and than 'id'. My goal is to keep only the patient information that corresponds with the first column, 'id'.
The first 9x9 looks like this:
id | id_all | MMSE BL | MMSE 12 | MMSE 24 | MMSE 36 | MMSE 48 | MMSE 60 | MMSE 72 |
---|---|---|---|---|---|---|---|---|
aaa002 | aaa000 | 22 | 18 | NA | NA | NA | NA | NA |
aaa003 | aaa002 | 29 | 28 | 28 | 29 | 30 | 29 | NA |
aaa005 | aaa003 | 30 | 29 | 30 | 30 | 30 | NA | NA |
aaa024 | aaa005 | 29 | 28 | 25 | NA | NA | 25 | NA |
aaa026 | aaa024 | 30 | 29 | 29 | 29 | NA | NA | NA |
aaa048 | aaa026 | 28 | 30 | 28 | 27 | 30 | 30 | NA |
aaa095 | aaa038 | 29 | 29 | 29 | 26 | NA | NA | NA |
aaa222 | aaa048 | 30 | 29 | 29 | 28 | 28 | 29 | NA |
So, based on the first column, I would like to somehow iterate through the second column, and delete all the information of the participants who do not have a match with the first column, 'id'. This means, in the first row, deleting columns 2:300, and keeping the first column intact.
At the end, I want my df to look like this:
id | id_all | MMSE BL | MMSE 12 | MMSE 24 | MMSE 36 | MMSE 48 | MMSE 60 | MMSE 72 |
---|---|---|---|---|---|---|---|---|
aaa002 | aaa002 | 29 | 28 | 28 | 29 | 30 | 29 | NA |
aaa003 | aaa003 | 30 | 29 | 30 | 30 | 30 | NA | NA |
aaa005 | aaa005 | 29 | 28 | 25 | NA | NA | 25 | NA |
aaa024 | aaa024 | 30 | 29 | 29 | 29 | NA | NA | NA |
aaa026 | aaa026 | 28 | 30 | 28 | 27 | 30 | 30 | NA |
aaa048 | aaa048 | 30 | 29 | 29 | 28 | 28 | 29 | NA |
aaa095 | ... | |||||||
aaa222 | ... |
In this example, id_all aaa000, and aaa038 dont have a match in the id column, so I would like the delete all the information about those participants. I can´t figure it out how to delete almost all the row (almost because only the first two cells remain), and to move all the cells up.
Another possibility will be, to move the first column down until id matches id_all, and then delete the whole row that has an empty cell in the id column.
Thanks!