3

I'm sure the answer to this will be VERY similar to this question but I just can't quite put it together.

I have two data frames. One is the data frame I'm working on:

df<-structure(list(Username = c("hmaens", "pgcmann", "gsamse", "gsamse", 
"gsamse", "gamse"), Title = c("Pharmacy Resident PGY2", "Associate Professor of Pediatrics", 
"Regulatory Coordinator", "Regulatory Coordinator", "Regulatory Coordinator", 
"Regulatory Coordinator"), `User Role` = c("Investigational Pharmacist", 
"Principal Investigator", "Calendar Build", "Protocol Management", 
"Subject Management", "Regulatory")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

and one is they key:

key<-structure(list(username = c("hmaens", "pgcmann", "gsamse", "gsamse", 
"gsamse", "gsamse"), training = c(0, 0, 1, 
1, 1, 1)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

I want to split my "df" data frame based on the "training" column in key. I.e. my results would be a data frame called dfZero with the exact same columns from df that had everyone from key with a "0" in training. And a separate data frame called dfOne with the 1's from key$training.

Joe Crozier
  • 944
  • 8
  • 20
  • 3
    You could use `df %>% left_join(key, by=c("Username"="username")) %>% split(~training)`. That will give you a list with the two separate data.frames. – MrFlick Apr 07 '22 at 15:12
  • What of the other usernames? `gsame` is not present in `key`, so that `training` is `NA`. – r2evans Apr 07 '22 at 15:25
  • 1
    FYI @MrFlick, that method mostly works but drops the `NA` values. An alternative is to use `dplyr::nest_by(training)` which will preserve them. – r2evans Apr 07 '22 at 15:30
  • `dfZero <- df[df$username %in% key[key$training == 0, "username"],]` – Skaqqs Apr 07 '22 at 15:30
  • oops. gsame is my typo. They're all supposed to be gsamse – Joe Crozier Apr 07 '22 at 15:47
  • @Skaqqs Honestly I like this approach better than the other one that creates a list. Do you want to make it an answer I can accept? – Joe Crozier Apr 07 '22 at 16:12

2 Answers2

2

Using %in%

dfZero <- df[df$Username %in% key[key$training == 0, "username"],]
dfOne <- df[df$Username %in% key[key$training == 1, "username"],]

Using merge()

dfZero <- merge(df, key[key$training == 0,], by.x = "Username", by.y = "username")
dfOne <- merge(df, key[key$training == 1,], by.x = "Username", by.y = "username")
Skaqqs
  • 4,010
  • 1
  • 7
  • 21
0

Using dplyr:

library(dplyr)

dflist <- merge(df, key, by.x = "Username", by.y = "username") %>%
  unique() %>%
  group_by(training) %>%
  group_split() 

edit: You can extract the individual list elements like so:

dfzero <- dflist[[1]]
dfone <- dflist[[2]]
nogbad
  • 435
  • 4
  • 15
  • I'm sorry for the silly question because i'm sure this pretty much works, but I can't figure out how to then get these data frames out of the list. I've found other answers like here: https://stackoverflow.com/questions/59169631/split-a-list-into-separate-data-frame-in-r That explain that, but when I use those solutions my data frames dont have column names anymore. Would you mind please taking your answer one step further to actually having separate data frames? I need to ultimately use write_csv to export them – Joe Crozier Apr 07 '22 at 16:03