I am working with a very large dataset. Consider the following example for illustration:
df1<-{data.frame(MyID=c(1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5),v1=c(0.1, 0.2, NA, 0.4, 0.2, 0.1, 0.8, 0.3, 0.1, 0.4, 0.3), v2=c(NA, 0.4, 0.2, 0.1, 0.8, 0.3, 0.1, 0.4, 0.3, 0.1, 0.2))}
df2<-{data.frame(MyID=c(1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5),v1=c(10, 8, 0, 6, 10, 5, 3, 1, 10, 8, 3), v2=c(0, 10, 5, 1, 8, 5,10, 3, 3, 1, 5))}
I would like to extract information from df1 but based on maximum values per MyID in df2. The final result should be a dataframe with:
- one row per a unique MyID
- each column would have the value in df1 corresponding the maximum of MyID group of df2.
The result should be
ExpectedResult<-{data.frame(MyID=c(1, 2, 3, 4, 5),v1=c(0.1,0.2,0.1,0.4,0.3), v2=c(0.1,0.4,0.2,0.1,0.2))}
What I have tried already but solved only a part of the problem:
- using groups and finding max per group, e.g.
df2Max<- df2 %>% group_by(MyID) %>% slice_max(1,)
- splitting the data using e.g.
df2.split <- split(df2, list(df2$MyID))
But, I am still not sure how to link the two dataframes to extract what I need.