-1

Here is my data set. Data in

I'd like to check if the gender with "Potential Original" matched the gender with "Potential Duplicate'. There is no specified group but 1 duplicate + 1 or more original acted like a group.

Here is the output I want (for duplicate it's NA because it's comparing to itself). Data out

Appreciate your help. Thanks.

Mira Shen
  • 1
  • 2

1 Answers1

0

Thanks Rahul for looking into this. This is what I tried and I think it worked. The logic is to create the seq # first for each block of Duplicate and Original and then pull the lag value with corresponding distance.

 library(data.table)
 setDT(df)[, counter := seq_len(.N), by = list(cumsum(Status == "Potential 
 Duplicate"))]

for (i in 1:nrow(df)) {
  if (df$Status[i]=="Potential Duplicate") {
   df$Gender_LAG[i] <-df2$Gender[i]
                                             } 
   else {
     df$Gender_LAG[i]<-df2$Gender[i-df2$counter[i]+1]
        }
                        }

Thanks. Looking forwards to seeing other options.

Mira Shen
  • 1
  • 2