I'm working with a large dataset of drugs and reactions using R. For now, I have the data structured as a very tall data frame that lists the report ID number, the Drug name, and the reported reactions. As you can tell, there is a one-to-many relationship between both IDs vs. drugs and drugs vs. reactions.
Keeping in mind that this dataset is MUCH larger than what I can duplicate here, I'd like to know how to find what pairs of drugs lead to what reactions and in what frequency.
Most importantly, I am interested in how to approach a problem like this. Is the data structured correctly? What concepts or libraries should I read about?
Here's a link to some real data: https://www.dropbox.com/s/kzx4mpyytbo9zil/query_result.csv
ID DRUG REACTION
1 1827 ASPIRIN CHEST PAIN
2 1827 CLARINEX CHEST PAIN
3 1827 ASPIRIN COUGH
4 1827 CLARINEX COUGH
5 1827 ASPIRIN HAEMOGLOBIN DECREASED
6 1827 CLARINEX HAEMOGLOBIN DECREASED
7 1827 ASPIRIN NEUTROPHIL COUNT INCREASED
8 1827 CLARINEX NEUTROPHIL COUNT INCREASED
9 1827 ASPIRIN PHARYNGOLARYNGEAL PAIN
10 1827 CLARINEX PHARYNGOLARYNGEAL PAIN
...
In my teeny little brain, the end result looks something like this...
Drug1 Drug2 Reaction Frequency
1 tylenol alcohol hepatic failure 298
2 advil aleve bleeding 201
3 aspirin advil renal failure 199
4 docusate senna diarrhea 146
5 senna sudafed palpitations 121
6 xanax alcohol sedation 111
7 clarinex benadryl dry mouth 96
...
569 ASPIRIN CLARINEX CHEST PAIN 2
Drug1 and Drug2 are the drug pairs with the highest frequency from the entire dataset. A "drug pair" is defined as any combination of two drugs with the same report ID. The example output above would be interpreted as, "row 1 had 298 unique report IDs for which hepatic failure was the reaction."