Check if value of column A is present in the same row or previous rows of column B

Question

I have this dataframe:

df <- structure(list(A = 1:5, B = c(1L, 5L, 2L, 3L, 3L)), 
                class = "data.frame", row.names = c(NA, -5L))

  A B
1 1 1
2 2 5
3 3 2
4 4 3
5 5 3

I would like to get this result:

  A B Result
1 1 1      B
2 2 5   <NA>
3 3 2   <NA>
4 4 3   <NA>
5 5 3      B

Strategy:

Check if A==B then assign B to new column Result if not NA.
But do this also for all PREVIOUS rows of B.

AIM:

I want to learn how to check if a certain value of column A say in row 5 is in the previous rows of column B (eg. row 1-4).

score 4 · Accepted Answer · answered Aug 26 '21 at 18:32

4

I hope the following code fits your general cases

transform(
  df,
  Result = replace(rep(NA, length(B)), match(A, B) <= seq_along(A), "B")
)

which gives

  A B Result
1 1 1      B
2 2 5   <NA>
3 3 2   <NA>
4 4 3   <NA>
5 5 3      B

answered Aug 26 '21 at 18:32

ThomasIsCoding

96,636
9
24
81

does this solution assume A is always 1,2,3 ... ? – Arthur Yip Aug 26 '21 at 18:39
@ArthurYip No, it can be any numeric vector. – ThomasIsCoding Aug 26 '21 at 18:40
Oh I see, if a match is in a row <= the seq_along result (the row number), then, it allows "B" to replace NA. – Arthur Yip Aug 26 '21 at 18:42
@ThomasIsCoding This looks very good. I am struggling to implement it to my use case. give me 5 more minutes. please. – TarJae Aug 26 '21 at 18:48
1

@ThomasIsCoding. Thank you very much. With your help I was able to solve this task – TarJae Aug 26 '21 at 18:59
1

Can this be golfed to `"B"[NA^(!match(df$A, df$B) <= seq_along(df$A))]`? – Henrik Aug 26 '21 at 20:18

score 3 · Answer 2 · answered Aug 26 '21 at 19:26

3

Here is a dplyr::rowwise approach:

library(dplyr)

df %>%
  rowwise %>% 
  mutate(result = ifelse(A %in% .[seq(cur_group_rows()),]$B, "B", NA))

#> # A tibble: 5 x 3
#> # Rowwise: 
#>       A     B result
#>   <int> <int> <chr> 
#> 1     1     1 B     
#> 2     2     5 <NA>  
#> 3     3     2 <NA>  
#> 4     4     3 <NA>  
#> 5     5     3 B

^{Created on 2021-08-26 by the reprex package (v0.3.0)}

answered Aug 26 '21 at 19:26

TimTeaFan

17,549
4
18
39

Thank you TimTeaFan. This is also very good! – TarJae Aug 26 '21 at 19:39
1

Maybe you should think about a verb for this procedure to implement in your package? – TarJae Aug 26 '21 at 21:17

Arthur Yip · Answer 3 · 2021-08-26T19:30:04.360

2

Just some minor changes to @ThomasIsCoding's answer to make it dplyr. Slightly more laid-out to be easier to read, in my opinion.

library(tidyverse)
df <- structure(list(A = 1:5, B = c(1L, 5L, 2L, 3L, 3L)), 
                class = "data.frame", row.names = c(NA, -5L))
match(df$A, df$B)
#> [1]  1  3  4 NA  2
df %>% mutate(Result = if_else(match(A, B) <= row_number(), 
                               "B", 
                               NA_character_))
#>   A B Result
#> 1 1 1      B
#> 2 2 5   <NA>
#> 3 3 2   <NA>
#> 4 4 3   <NA>
#> 5 5 3      B

Created on 2021-08-26 by the reprex package (v1.0.0)

edited Aug 26 '21 at 19:30

answered Aug 26 '21 at 18:57

Arthur Yip

5,810
2
31
50

This looks very fine. I will try it on my use case. A similar approach failed in my case. But I will see now. – TarJae Aug 26 '21 at 19:01
Thank you this is very helpful. – TarJae Aug 26 '21 at 19:05

akrun · Answer 4 · 2021-08-26T19:37:56.553

2

We can use

library(dplyr)
library(purrr)
df %>% 
   mutate(Result = map_chr(row_number(), ~ case_when(A[.x] %in% B[seq(.x)]~ "B")))

-output

 A B Result
1 1 1      B
2 2 5   <NA>
3 3 2   <NA>
4 4 3   <NA>
5 5 3      B

edited Aug 26 '21 at 19:37

answered Aug 26 '21 at 19:23

akrun

874,273
37
540
662

Check if value of column A is present in the same row or previous rows of column B

4 Answers4

Linked