I have a data frame, grouped by column "id". The data frame should be split based on a substring that occurs in column "alteration". The substrings I am interested in are "intermediate", "high", or none of the strings occurring within a group.
Here is a sample data frame:
df <- data.frame(id= c(1, 1, 2, 2, 3, 3),
disease = c("brain", "brain", "neck", "neck","breast", "breast"),
status = c("yes", "yes","no","no","yes","yes"),
gene = c("P53","TMB","ATM","TMB","RAF","NFKB"),
alteration = c("TP53Y","TMB-intermediate","TATMY","TMB-high","TRAFY","TNFKBY"))
resulting in data frame
id disease status gene alteration
1 brain yes P53 TP53Y
1 brain yes TMB TMB-intermediate
2 neck no ATM TATMY
2 neck no TMB TMB-high
3 breast yes RAF TRAFY
3 breast yes NFKB TNFKBY
Expected output should be three data frames:
dfIntermed
id disease status gene alteration
1 brain yes P53 TP53Y
1 brain yes TMB TMB-intermediate
dfHigh
id disease status gene alteration
2 neck no ATM TATMY
2 neck no TMB TMB-high
dfNo (this data frame contains no information about TMB within group)
id disease status gene alteration
3 breast yes RAF TRAFY
3 breast yes NFKB TNKFBY
EDIT
Another post suggests the use of split(). When I split the data frame using the code:
out <- split(df, f = df$alteration )
out[[1]]
I get back six data frames, but I'm not able to grep the strings in f =. Is it possible to grep for 'high' or 'intermediate' within split?
EDIT II
I can split in combination with grep, but this returns only single rows and not the whole group
outB <- split(df, list(id, grepl("high", df$alteration)))
outB[[2]]
EDIT III
Issue resolved in another post