I want to calculate the average delay between transactions for different splits. I already have a solution, but I need the delay calculated through a different method.
The dataset looks like this:
customer_id transaction_date type sign period
A 01/01/15 A C 30 days
A 05/01/15 A C 30 days
A 10/01/15 B D 30 days
A 25/01/15 B D 30 days
transaction_data = structure(list(customer_id = c("A", "A", "A", "A"),
transaction_date = c("01/01/15",
"05/01/15", "10/01/15", "25/01/15"), type = c("A", "A", "B",
"B"), sign = c("C", "C", "D", "D"), period = c("30 days", "30 days",
"30 days", "30 days")), .Names = c("customer_id", "transaction_date",
"type", "sign", "period"), row.names = c(NA, -4L), class = "data.frame")
Solution old method
What I used to do was to calculate the delay between subsequent transactions first, like this:
# Delay between subseauent transactions
library(data.table)
setDT(transaction_data)[,delay_in_transactions_days:= c(0, diff.Date(transaction_date)), .(customer_id)]
# Convert seconds to days
transaction_data <- mutate(transaction_data, delay_in_days = delay_in_transactions_days/86400)
# Convert to integer
transaction_data$delay_in_days <- as.integer(transaction_data$delay_in_days)
And then through dcast calculate the mean of each split for each transaction delay:
dcast(setDT(transaction_data), customer_id ~ paste0("avg_delay_",period), value.var = "delay_in_days", mean)
Problem new method
The new method I want to use to calculate the delay is through the following equation:
For each client: (latest transaction - first transaction) / (number of transactions - 1)
The problem of course, is that the delay cannot be calculated per period, as then it would be delay of all transactions. Instead, it needs to be calculated as delay per period of a particular type or sign, or combination of the splits.
Any ideas how I might approach the problem?
Expected output
customer_id av.delay_30days av.delay_30_days_TYPE_A av.delay_30_days_TYPE_B
A 8 4 15