In R, how to delete a series of cells based on the value from another cell?

Question

I have a 300x300 df. Each row represents the data of a different patient, named with an id. The first 2 columns are the patients id´s. Column 'id_all' contains more participants than needed, and than 'id'. My goal is to keep only the patient information that corresponds with the first column, 'id'.

The first 9x9 looks like this:

id	id_all	MMSE BL	MMSE 12	MMSE 24	MMSE 36	MMSE 48	MMSE 60	MMSE 72
aaa002	aaa000	22	18	NA	NA	NA	NA	NA
aaa003	aaa002	29	28	28	29	30	29	NA
aaa005	aaa003	30	29	30	30	30	NA	NA
aaa024	aaa005	29	28	25	NA	NA	25	NA
aaa026	aaa024	30	29	29	29	NA	NA	NA
aaa048	aaa026	28	30	28	27	30	30	NA
aaa095	aaa038	29	29	29	26	NA	NA	NA
aaa222	aaa048	30	29	29	28	28	29	NA

So, based on the first column, I would like to somehow iterate through the second column, and delete all the information of the participants who do not have a match with the first column, 'id'. This means, in the first row, deleting columns 2:300, and keeping the first column intact.

At the end, I want my df to look like this:

id	id_all	MMSE BL	MMSE 12	MMSE 24	MMSE 36	MMSE 48	MMSE 60	MMSE 72
aaa002	aaa002	29	28	28	29	30	29	NA
aaa003	aaa003	30	29	30	30	30	NA	NA
aaa005	aaa005	29	28	25	NA	NA	25	NA
aaa024	aaa024	30	29	29	29	NA	NA	NA
aaa026	aaa026	28	30	28	27	30	30	NA
aaa048	aaa048	30	29	29	28	28	29	NA
aaa095	...
aaa222	...

In this example, id_all aaa000, and aaa038 dont have a match in the id column, so I would like the delete all the information about those participants. I can´t figure it out how to delete almost all the row (almost because only the first two cells remain), and to move all the cells up.

Another possibility will be, to move the first column down until id matches id_all, and then delete the whole row that has an empty cell in the id column.

Thanks!

score 1 · Answer 1 · answered Jan 14 '21 at 10:03

1

In tidyverse:

library(dplyr)
df %>% 
  filter(id == id_all)

In base:

df[df$id == df$id_all, ]

answered Jan 14 '21 at 10:03

JBGruber

11,727
1
23
45

score 1 · Accepted Answer · answered Jan 14 '21 at 10:07

I think this will serve the purpose

library(tidyverse)

df %>% filter(!is.na(match(id_all, df$id))) %>% select(-id)

  id_all MMSE.BL MMSE.12 MMSE.24 MMSE.36 MMSE.48 MMSE.60 MMSE.72
1 aaa002      29      28      28      29      30      29      NA
2 aaa003      30      29      30      30      30      NA      NA
3 aaa005      29      28      25      NA      NA      25      NA
4 aaa024      30      29      29      29      NA      NA      NA
5 aaa026      28      30      28      27      30      30      NA
6 aaa048      30      29      29      28      28      29      NA

Since id == id_all in the final data, there is no need for a duplicate column. If it is needed, it can be easily created by adding mutate(id = id_all) at the end of the above syntax

df %>% filter(!is.na(match(id_all, df$id))) %>% select(-id) %>% mutate(id = id_all) %>%
  select(id, everything())

      id id_all MMSE.BL MMSE.12 MMSE.24 MMSE.36 MMSE.48 MMSE.60 MMSE.72
1 aaa002 aaa002      29      28      28      29      30      29      NA
2 aaa003 aaa003      30      29      30      30      30      NA      NA
3 aaa005 aaa005      29      28      25      NA      NA      25      NA
4 aaa024 aaa024      30      29      29      29      NA      NA      NA
5 aaa026 aaa026      28      30      28      27      30      30      NA
6 aaa048 aaa048      30      29      29      28      28      29      NA

dput of df used

df <- structure(list(id = c("aaa002", "aaa003", "aaa005", "aaa024", 
"aaa026", "aaa048", "aaa095", "aaa222"), id_all = c("aaa000", 
"aaa002", "aaa003", "aaa005", "aaa024", "aaa026", "aaa038", "aaa048"
), MMSE.BL = c(22L, 29L, 30L, 29L, 30L, 28L, 29L, 30L), MMSE.12 = c(18L, 
28L, 29L, 28L, 29L, 30L, 29L, 29L), MMSE.24 = c(NA, 28L, 30L, 
25L, 29L, 28L, 29L, 29L), MMSE.36 = c(NA, 29L, 30L, NA, 29L, 
27L, 26L, 28L), MMSE.48 = c(NA, 30L, 30L, NA, NA, 30L, NA, 28L
), MMSE.60 = c(NA, 29L, NA, 25L, NA, 30L, NA, 29L), MMSE.72 = c(NA, 
NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-8L))

> df
      id id_all MMSE.BL MMSE.12 MMSE.24 MMSE.36 MMSE.48 MMSE.60 MMSE.72
1 aaa002 aaa000      22      18      NA      NA      NA      NA      NA
2 aaa003 aaa002      29      28      28      29      30      29      NA
3 aaa005 aaa003      30      29      30      30      30      NA      NA
4 aaa024 aaa005      29      28      25      NA      NA      25      NA
5 aaa026 aaa024      30      29      29      29      NA      NA      NA
6 aaa048 aaa026      28      30      28      27      30      30      NA
7 aaa095 aaa038      29      29      29      26      NA      NA      NA
8 aaa222 aaa048      30      29      29      28      28      29      NA

In R, how to delete a series of cells based on the value from another cell?

2 Answers2