I am relatively new to R. I have a dataframe df
that looks like this, where PMID is an ID:
PMID Variable Value
1 MH Humans
1 MH Male
1 MH Middle Aged
1 RN Aldosterone
1 RN Renin
2 MH Accidents, Traffic
2 MH Male
2 RN Antivenins
3 MH Humans
3 MH Crotulus
3 MH Young Adult
and so on. As you can see, some IDs have multiple MHs and/or RNs and some have none or one. I want to collapse all entries for each variable for each PMID. I also want to be able to separate each entry with a comma once collapsed, but first substitute the spaces present in the above dataframe into _
so that I can retain each value so that my final dataframe looks like this:
PMID MH RN
1 Humans, Male, Middle_Aged Aldosterone, Renin
2 Accidents,_Traffic, Male Antivenins
3 Humans, Crotulus, Young_Adult
I have over 5 million rows, so please help in making the code computationally efficient. Thanks for your help.