How to distribute a series of volumes across a distribution using R

Question

I am trying to forecast the volume of events (library book returns). I have a dataframe of potential volumes of expected returns a particular day (derived from a table) and a density function of previous return behaviour. My plan was to use the convolve function but I am coming unstuck. Any ideas of the best way forward?

Calculating the borrow time Length

data$borrow_length <- data$due_date - data$return_date

Producing the PDF

renewal_pdf <- density(data$borrow_length)
plot(borrow_pdf)

producing the volumes

return_volume <- as.data.frame(table(data$due_date))

output <- convolve(borrow_pdf, return_volume$Freq, type = "open")

My hope was to finish with a table with the forecasted return dates taking account for all returns both early and late.

I might add, if you know of a better way of approaching it, I am open to suggestions. — rasc201, May 17 '19 at 14:16
Welcome to S.O. Please look up guidelines for asking minimum, complete, reproducible questions and edit your question accordingly. — shea, May 17 '19 at 14:20
Thanks @Shea, will do my best moving forward. I appreciate my question was vague, but having tried the convolve function was falling short. — rasc201, May 21 '19 at 11:13

score 0 · Accepted Answer · answered May 17 '19 at 16:15

Agreed with @shea: a reproducible example would help.

Here is one, from what I understood:

set.seed(1)
N = 200

# Historical data with known return date
old_data = data.frame( due_date = as.Date("2019-04-01") + floor(runif(N, 0, 30)) )
old_data$return_date = old_data$due_date + round(rnorm(N, 0, 5))

# Currently borrowed books
current_data = data.frame( due_date = as.Date("2019-05-10") + floor(runif(N, 0, 30)) )

If I understood correctly, you want to have an estimation of the distribution of return_date (not yet known) on current_data. Here is a solution, with convolution computed manually: this is not efficient but easily understandable.

# For semantics, I renamed your borrow_length into borrow_delay
old_data$borrow_delay = old_data$return_date - old_data$due_date

# Compute its distribution (no smoothing)
distr_delay = as.data.frame(prop.table(table(delay = old_data$borrow_delay)), responseName="p_delay")
distr_delay$delay = as.integer(distr_delay$delay)

# Counts by due date
tab_volume = as.data.frame(table(due_date = current_data$due_date))
tab_volume$due_date = as.Date(as.character(tab_volume$due_date))

# Explicit convolution
distr_return = merge(tab_volume, distr_delay)
distr_return$return_date = with(distr_return, due_date + delay)
distr_return$expected_n_returns = with(distr_return, Freq*p_delay)
distr_return = with(distr_return, tapply(expected_n_returns, return_date, sum))
# Reformat
distr_return = data.frame(
  return_date = as.Date(names(distr_return)),
  expected_n_returns = c(distr_return)
)

# Sanity check: sum of expectations is 200 (the number of books borrowed)
sum(distr_return$expected_n_returns)

with(distr_return, plot(return_date, expected_n_returns))

How to distribute a series of volumes across a distribution using R

Calculating the borrow time Length

Producing the PDF

producing the volumes

1 Answers1