0

I'm often working with data frames in R where different parameters have multiple measurements recorded at various time points for each individual. So far, I have repeated the "constant" parameters such as age and gender on each row for the same individual, but somehow it seems a little trivial to repeat the same information again and again.

Basically, I would like become able to fetch and "merge" information from two data frames, for instance when fitting a model such as:

glm(hormone_level ~ time_point + age + gender, random = ~ 1 | patient_id)

hormone_level and time_point should then be fetched from data frame 1, while age and gender should be fetched from data frame 2 (see below).

I'm not sure whether I am looking for information on lists, or if it is better to use functions to merge the relevant information from the two data frames to make a third. Do you know a place where I can find more information on this topic, preferably with some useful examples?

Data frame 1:

patient_id  time_point  hormone_level
001         1           55
001         2           85
001         3           105
002         1           48
...

Data frame 2:

patient_id  age  gender
001         30   M
002         45   F
003         32   F
...
Joe
  • 8,073
  • 1
  • 52
  • 58
  • 1
    The time to use lists of data frames is when they are similar in structure (mostly the same columns) - especially if you are tempted to name them sequentially `df1`, `df2`, `df3`, etc., it is a good indication you should be using a list. If your data frames have very different shapes and have different purposes, like the two you show in this question, there is nothing to be gained by combining them in the same list. – Gregor Thomas Nov 07 '16 at 22:24
  • 2
    In a case like this, probably you will write some sort of join (see [How to join data frames?](http://stackoverflow.com/q/1299871/903061) for examples) to create a single data frame that will then be used for some plots/models/reporting. Often these source data frames would be flat files or tables in an external data base - you write some code to assemble a single table for analysis, then you do the analysis. – Gregor Thomas Nov 07 '16 at 22:25

1 Answers1

1

In your example data frame 1 is the experimental data and data frame 2 is the subject metadata. Data frame 2 is effectively a list of subjects and patient_id is the primary key (in database terminology). You need to look up the values in data frame 2 using this key and add them to data frame 1, or to put it more properly, do a "join". There are numerous ways to do this, but I recommend the join functions from dplyr. For example

library(dplyr)
left_join(df1, df2, by="patient_id")

will add age and gender to data frame 1. You can then do your analysis on the new df.

Here's a very good blog entry on this: https://blog.exploratory.io/joining-two-data-sets-to-supplement-or-filter-172bbb6804e3#.v8mqhlsdl

Joe
  • 8,073
  • 1
  • 52
  • 58