1

I have a table that consists of only columns of type Date. The data is about shopping behavior of customers on a website. The columns correspond to the first time an event is triggered by a customer (NULL if no occurrence of the event). One of the columns is the purchase motion.

Here's a MRE for the starting state of the Database:

structure(list(purchase = structure(c(NA, NA, 10729, NA, 10737
), class = "Date"), action_A = structure(c(NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), class = "Date"), action_B = structure(c(NA, 
NA, 10713, NA, 10613), class = "Date"), action_C = structure(c(10707, 
10729, 10739, NA, NA), class = "Date")), row.names = c(NA, -5L
), class = c("tbl_df", "tbl", "data.frame"))

I want to update the table so that all the columns of a particular row, all the cells that did not occur within 30 days prior to the purchase are replaced with NULL. However, if the purchase motion is NULL, I'd like to keep the dates of the other events. So after my envisioned transformation, the above table should look as the following:

structure(list(purchase = structure(c(NA, NA, 10729, NA, 10737
), class = "Date"), action_A = structure(c(NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), class = "Date"), action_B = structure(c(NA, 
NA, 10713, NA, NA), class = "Date"), action_C = structure(c(10707, 
10729, NA, NA, NA), class = "Date")), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

I have yet to be able to achieve this transformation, and would appreciate the help!

Finally, I'd like to transform the above table into a binary format. I've achieved this via the below code segment; however, I'd like to know if I can do this in a simpler way.

df_c <- df_b %>% 
     is.na() %>%  
     magrittr::not() %>%
     data.frame()
df_c <- df_c * 1
alexT
  • 49
  • 5

1 Answers1

0

I assume that by saying "replaced by NULL" you actually mean "replaced by NA".

I also assume that the first structure in your question is df_a.

df_b <- df_a %>% mutate(across(starts_with("action"), 
              ~ if_else(purchase - . > 30, as.Date(NA), .)))
  • mutate(across(cols, func)) applies func to all selected cols.
  • the real trick here is to use if_else and cast NA into Date class. Otherwise, the dates will be converted to numeric vectors.

Result:

# Tibble (class tbl_df) 4 x 5:
 │purchase  │action_A│action_B  │action_C  
1│NA        │NA      │NA        │NA        
2│NA        │NA      │NA        │NA        
3│1999-05-18│NA      │1999-05-02│1999-05-28
4│NA        │NA      │NA        │NA        
5│1999-05-26│NA      │NA        │NA   

One problem which remains as a homework exercise: how do you modify the if_else such that you will keep the action if purchase is NA? (this should be now very simple!) I did not include that on purpose because you omitted it from the question.

January
  • 16,320
  • 6
  • 52
  • 74