How to (efficiently) retrieve the value of a column in a different row based on matching IDs?

Question

I'm trying to do what I did in the example below, but on a large scale so Ideally the solution is as efficient as possible. Thanks in advance!

ID1 <- c("a", "d", "c", "d")
ID2 <- c("d", "e", "f", "g")

df <- data.frame(ID1, ID2)

df

  ID1 ID2
1   a   d
2   d   e
3   c   f
4   d   g

Function that finds "d" in column "ID1", and returns "e" in the first row (where ID1 = "a")

If run again, or specified that we want the second match, function that finds "d" in column "ID1", and returns "g" in the first row (where ID1 = "a")

ID3 <- c("e", "", "", "")
ID4 <- c("g", "", "", "")


desired <- data.frame(ID1, ID2, ID3, ID4)

desired 

  ID1 ID2 ID3 ID4
1   a   d   e   g
2   d   e        
3   c   f        
4   a   g

akrun · Answer 1 · 2022-07-29T19:23:26.153

0

We could use

vals <- with(df, ID2[ID1 %in% ID2])
df[1, paste0("ID", seq_along(vals)+2)] <- vals

-output

> df
  ID1 ID2  ID3  ID4
1   a   d    e    g
2   d   e <NA> <NA>
3   c   f <NA> <NA>
4   d   g <NA> <NA>

edited Jul 29 '22 at 19:23

answered Jul 29 '22 at 18:25

akrun

874,273
37
540
662

This is what I'm looking for, thank you very much! Is there any way to get the nth match instead of always the first? – D_Jakob Jul 29 '22 at 18:35
@D_Jakob `match` returns only the first. If you want the `nth`, there is a `nth` function in dplyr that can be used on the position index. Can you update your post with a more general example showing that case – akrun Jul 29 '22 at 18:36
I think I got it! Since there should be only two matches, I can sort the columns to change the order of the match. My approach doesn't work for >2 matches however, so I'd still be interested in learning how to do that. – D_Jakob Jul 29 '22 at 19:02
@D_Jakob I was looking at your expeceted output now. Do you want to create a column for every single match? i.e. f there are 1000 rows, and there are 100 matches, you will have 100 new columns? – akrun Jul 29 '22 at 19:03
That's correct! – D_Jakob Jul 29 '22 at 19:15
@D_Jakob Isn't that inefficient storage (as your question is about efficiency. i.e. you can get the column matches as `df %>% mutate(ID = case_when(ID1 %in% ID2 ~ ID2))` – akrun Jul 29 '22 at 19:18
You're absolutely right! Thank you for all your help. – D_Jakob Jul 29 '22 at 19:21
@D_Jakob I think the base R update should be efficient for you – akrun Jul 29 '22 at 19:26

How to (efficiently) retrieve the value of a column in a different row based on matching IDs?

1 Answers1