0

I'm trying to do what I did in the example below, but on a large scale so Ideally the solution is as efficient as possible. Thanks in advance!

ID1 <- c("a", "d", "c", "d")
ID2 <- c("d", "e", "f", "g")

df <- data.frame(ID1, ID2)

df

  ID1 ID2
1   a   d
2   d   e
3   c   f
4   d   g

Function that finds "d" in column "ID1", and returns "e" in the first row (where ID1 = "a")

If run again, or specified that we want the second match, function that finds "d" in column "ID1", and returns "g" in the first row (where ID1 = "a")

ID3 <- c("e", "", "", "")
ID4 <- c("g", "", "", "")


desired <- data.frame(ID1, ID2, ID3, ID4)

desired 

  ID1 ID2 ID3 ID4
1   a   d   e   g
2   d   e        
3   c   f        
4   a   g           
akrun
  • 874,273
  • 37
  • 540
  • 662
D_Jakob
  • 1
  • 1

1 Answers1

0

We could use

vals <- with(df, ID2[ID1 %in% ID2])
df[1, paste0("ID", seq_along(vals)+2)] <- vals

-output

> df
  ID1 ID2  ID3  ID4
1   a   d    e    g
2   d   e <NA> <NA>
3   c   f <NA> <NA>
4   d   g <NA> <NA>
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This is what I'm looking for, thank you very much! Is there any way to get the nth match instead of always the first? – D_Jakob Jul 29 '22 at 18:35
  • @D_Jakob `match` returns only the first. If you want the `nth`, there is a `nth` function in dplyr that can be used on the position index. Can you update your post with a more general example showing that case – akrun Jul 29 '22 at 18:36
  • I think I got it! Since there should be only two matches, I can sort the columns to change the order of the match. My approach doesn't work for >2 matches however, so I'd still be interested in learning how to do that. – D_Jakob Jul 29 '22 at 19:02
  • @D_Jakob I was looking at your expeceted output now. Do you want to create a column for every single match? i.e. f there are 1000 rows, and there are 100 matches, you will have 100 new columns? – akrun Jul 29 '22 at 19:03
  • That's correct! – D_Jakob Jul 29 '22 at 19:15
  • @D_Jakob Isn't that inefficient storage (as your question is about efficiency. i.e. you can get the column matches as `df %>% mutate(ID = case_when(ID1 %in% ID2 ~ ID2))` – akrun Jul 29 '22 at 19:18
  • You're absolutely right! Thank you for all your help. – D_Jakob Jul 29 '22 at 19:21
  • @D_Jakob I think the base R update should be efficient for you – akrun Jul 29 '22 at 19:26