0

I'm trying to build a model to forecast direct mail marketing campaign responses. In the code below I was able to use responses from a previous campaign to create a smooth curve (i.e. continuous probability). Now, I need to find the total area under this curve for each day so I know what percent complete the campaign is on a given day. Theoretically using the integrate function then finding the difference between areas using the diff function should work. For example, I will be able to find the area under the curve after day 2 and subtract the area under the curve after day 1. Knowing the addional area under the curve for each day will help me tell the % completion with each day. The problem is that I can't figure out a way to integrate this 64 day curve so that the total density sums to 1.

#vector of direct mail marketing responses over 63 days  
responses <- c(24.16093706,
41.59607507,
68.20083052,
85.19109064,
100.0704403,
58.6600221,
86.08475816,
88.97439581,
65.58341418,
49.25588053,
53.63602085,
47.03620672,
29.71552264,
32.85862747,
31.29118096,
23.67961069,
19.81261675,
18.69300933,
17.25738435,
12.01161679,
12.36734071,
14.32360673,
11.02390849,
9.108021409,
9.647965622,
8.815576548,
5.67225654,
5.739220185,
6.233999138,
5.527376627,
5.024065761,
5.565266355,
4.626749364,
3.480761716,
4.621902301,
4.518554271,
4.075985188,
3.204946787,
3.174020873,
2.966915873,
2.129178828,
2.673009031,
2.410429043,
2.331287075,
2.509300578,
2.13820695,
2.53433787,
1.603934405,
1.555813592,
1.834605068,
1.842905685,
1.454045577,
2.08684322,
1.318276487,
0.807666643,
1.333167088,
1.004526525,
1.180110123,
1.078079735,
1.151394678,
1.426747942,
0.699119833,
0.583347236)


set.seed(2) 
## install.packages("MASS") 
library("MASS")

shape_and_scale <- fitdistr(responses,'weibull')

shape_and_scale

#now use the curve() function, dweibull, and the shape and scale parameters to create a smooth curve 
curve results <- curve(dweibull(x,0.70730466,13.79467490),from=0, to=63)

Now I need a way to integrate this curve to find the area under the curve after day 1, day 2, day 3, ...etc. After that I should be able to use diff to find the difference between day 2 and day 1, etc. which I can use to find the % of campaign completion after each day. In my code above I truncated the curve from 0 to 63. Is there a way to use this? For example, if I just do: diff(pweibull(0:63,0.70730466,13.79467490)) I do not leverage the fact that I already truncated the curve from 0 to 63 so the density doesn't add to 1.

For example:

sum(diff(pweibull(0:63,0.70730466,13.79467490))) equals .94, which is the same thing as: integrate(dweibull, 0, 63, shape = 0.70730466,scale = 13.79467490)

...but these don't leverage the fact that in the first block of code I've already truncated the curve to 63 days. I just like to integrate that so that the sum of the area under the curve is 1?

Thanks

Ryan Chase
  • 2,384
  • 4
  • 24
  • 33
  • Is `integrate(dweibull,0,63,shape=0.70730466,scale=13.79467490)` what you are looking for? – nicola Apr 10 '15 at 18:23
  • That seems to be what I was looking for. However, in this case why does the total does not add to 1. Can you explain what I'm missing here? – Ryan Chase Apr 13 '15 at 15:39
  • Because you are integrating between 0 and 63. To have 1 you should integrate between 0 and infinity: `integrate(dweibull,0,Inf,shape=0.70730466,scale=13.79467490)`. – nicola Apr 13 '15 at 15:49
  • I'm confused. Why does dweibull extend beyond 63 days when my data (contained in the variable "responses") only extends to 63 days? How would you recommend that I truncate the curve so that I can get the integral of the 63 days to add to 1? – Ryan Chase Apr 13 '15 at 16:04
  • At this point, you seem to have a statistical question, not a programming question. Ben Bolker's answer correctly tells you how to integrate the Weibull density. Then maybe turn to stats.stackexchange to work our what you should do instead. – Gregor Thomas Apr 13 '15 at 17:20
  • 1
    Also, work on your terminology: truncating is chopping off. The only place where you truncate is in the integral.... you take a [Weibull distribution](http://en.wikipedia.org/wiki/Weibull_distribution), which will *always* have support from from 0 to Inf, so you will have to go all the way from `0` to `Inf` to be exactly 1 under the curve. If you stop at 63 days, you've truncated and cut off a tail. If you want to *scale* this part of the density to be 1, then just divide by 0.63. But you should be aware what you're doing and why you're doing it, which isn't a programming question. – Gregor Thomas Apr 13 '15 at 17:24
  • 1
    @Gregor, I agree *except* you shouldn't divide by 0.63, you should divide by `pweibull(63,...)` – Ben Bolker Apr 13 '15 at 17:29
  • @BenBolker Oops, yes. – Gregor Thomas Apr 13 '15 at 17:36

1 Answers1

3

I think you're looking for

d0 <- diff(pweibull(0:63,0.70730466,13.79467490))

If you just want to normalize this to sum to 1, then divide it either by pweibull(63,...) or by sum(d0) (which are the same).

If you want a final category that includes everything beyond 63, use

d1 <- diff(pweibull(c(0:63,Inf),0.70730466,13.79467490))
sum(d1)   ## 1

The latter is equivalent to c(d0,1-sum(d0)).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • That seems to be what I was looking for. However, in this case when calculating the sum the total does not add to 1: `sum(diff(pweibull(0:63,0.70730466,13.79467490)))` Can you explain what I'm missing here? – Ryan Chase Apr 13 '15 at 15:24
  • ...nicola answered this question. But I'm still confused- how would you recommend that I truncate the curve so that I can get the integral of the 63 days to add to 1? ...even if I add from=0 and to=63, nothing changes: `sum(diff(pweibull(0:63,0.70730466,13.79467490), from=0, to=63))` – Ryan Chase Apr 13 '15 at 16:33
  • Is there a way to integrate over the curve that I already truncated at 63 days using: `curve results <- curve(dweibull(x,0.70730466,13.79467490),from=0, to=63)`. Basically, I don't want any density to occur outside of this 63 day range. That way I can take the total number of responses I think I'll get from a campaign and find out what % will come each day as a total of the 63 day window. – Ryan Chase Apr 13 '15 at 17:10