4

I'm having a hard time building an efficient procedure that adds and multiplies probability density functions to predict the distribution of time that it will take to complete two process steps.

Let "a" represent the probability distribution function of how long it takes to complete process "A". Zero days = 10%, one day = 40%, two days = 50%. Let "b" represent the probability distribution function of how long it takes to complete process "B". Zero days = 10%, one day = 20%, etc.

Process "B" can't be started until process "A" is complete, so "B" is dependent upon "A".

a <- c(.1, .4, .5)
b <- c(.1,.2,.3,.3,.1)

How can I calculate the probability density function of the time to complete "A" and "B"?

This is what I'd expect as the output for or the following example:

totallength <- 0 # initialize
totallength[1:(length(a) + length(b))] <- 0 # initialize
totallength[1] <- a[1]*b[1]
totallength[2] <- a[1]*b[2] + a[2]*b[1]
totallength[3] <- a[1]*b[3] + a[2]*b[2] + a[3]*b[1]
totallength[4] <- a[1]*b[4] + a[2]*b[3] + a[3]*b[2]
totallength[5] <- a[1]*b[5] + a[2]*b[4] + a[3]*b[3]
totallength[6] <- a[2]*b[5] + a[3]*b[4]
totallength[7] <- a[3]*b[5]

print(totallength)
[1] [1] 0.01 0.06 0.16 0.25 0.28 0.19 0.05
sum(totallength)
[1] 1

I have an approach in visual basic that used three for loops (one for each of the steps, and one for the output) but I hope I don't have to loop in R.

Since this seems to be a pretty standard process flow question, part two of my question is whether any libraries exist to model operations flow so I'm not creating this from scratch.

josliber
  • 43,891
  • 12
  • 98
  • 133
Jonathan
  • 491
  • 5
  • 19

3 Answers3

6

The efficient way to do this sort of operation is to use a convolution:

convolve(a, rev(b), type="open")
# [1] 0.01 0.06 0.16 0.25 0.28 0.19 0.05

This is efficient both because it's less typing than computing each value individually and also because it's implemented in an efficient way (using the Fast Fourier Transform, or FFT).

You can confirm that each of these values is correct using the formulas you posted:

(expected <- c(a[1]*b[1], a[1]*b[2] + a[2]*b[1], a[1]*b[3] + a[2]*b[2] + a[3]*b[1], a[1]*b[4] + a[2]*b[3] + a[3]*b[2], a[1]*b[5] + a[2]*b[4] + a[3]*b[3], a[2]*b[5] + a[3]*b[4], a[3]*b[5]))
# [1] 0.01 0.06 0.16 0.25 0.28 0.19 0.05
josliber
  • 43,891
  • 12
  • 98
  • 133
5

See the package:distr. Choosing the term "multiply" is unfortunate, since the situation described is not one where the contributions to probabilities is independent (where multiplication of probabilities would be the natural term to use). It's rather some sort of sequential addition, and that is exactly what the distr package provides as its interpretation of what "+" should mean when used as a symbolic manipulation of two discrete distributions.

 A <- DiscreteDistribution ( setNames(0:2, c('Zero', 'one', 'two') ), a)
 B <- DiscreteDistribution(setNames(0:2, c(  "Zero2" ,"one2", "two2", 
                                               "three2", "four2") ),  b )
?'operators-methods'  # where operations on 2 DiscreteDistribution are convolution
plot(A+B)

enter image description here

After a bit of nosing around I see that the actual numeric values can be found here:

 A.then.B <- A + B
> environment(A.the.nB@d)$dx
[1] 0.01 0.06 0.16 0.25 0.28 0.19 0.05

Seems like there should have been a method for display of the probabilities, and I'm not a regular user of this fascinating package so there well may be one. Do read the vignette and the code-demos ... which I have not yet done. Further noodling around convinces me that the right place to look is in the companion package: distrDoc where the vignette is 100+ pages long. And it shouldn't have required any effort to find it, either, since that advice is in the messages that print when the package is loaded ... except in my defense there were a couple of pages of messages, so it was more tempting to jump into coding and using the help pages.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Excellent, thanks! For the sake of documentation I modified your code slightly: B <- DiscreteDistribution(setNames(0:4, c( "Zero2" ,"one2", "two2", "three2", "four2") ), b ) and environment(A.then.B@d)$dx – Jonathan May 12 '15 at 18:26
  • I will read up on this library and determine if it's applicable to my greater need . . . production line optimization and constraint identification. – Jonathan May 12 '15 at 18:27
  • by the way, you're right regarding my poor choice of wording. "Mutliply probabilities" is misleading. I couldn't think of what to call it, which made it hard to search for an answer. – Jonathan May 12 '15 at 18:38
  • I wasn't clear from the documentation if this super-package ensemble of symbolic methods was designed for production work. On the one hand it seemed very suitable for a course in probability, but on the other hand the authors are also building packages with financial applications. – IRTFM May 12 '15 at 18:59
2

I'm not familiar with a dedicated package that does exactly what your example describes. but let me sujust a more robust solution for this problem. You are looking for a method to estimate the distribution of a process that might be combined by an n steps process, in your case 2 that might not be as easy to compute as your example. The approach Iwould use is a simulation, of 10k observations drown from the underlying distributions, and then calculating the density function of the simulated results. using your example we can do the following:

x <- runif(10000)
y <- runif(10000)

library(data.table)
z <- as.data.table(cbind(x,y))
z[x>=0 & x<0.1, a_days:=0]
z[x>=0.1 & x<0.5, a_days:=1]
z[x>=0.5 & x<=1, a_days:=2]
z[y>=0 & y <0.1, b_days:=0]
z[x>=0.1 & x<0.3, b_days:=1]
z[x>=0.3 & x<0.5, b_days:=2]
z[x>=0.5 & x<0.8, b_days:=3]
z[x>=0.8 & x<=1, b_days:=4]
z[,total_days:=a_days+b_days]
hist(z[,total_days])

this will result in a very good proxy if the density and the aproach would also work if your second process was drown from an exponential distribution. in which case you'd use rexp function to calculate b_days directly.

Yevgeny Tkach
  • 657
  • 4
  • 10
  • thanks! certainly something to think about since this is closer to the gist of what I'm trying to do. – Jonathan May 12 '15 at 18:35