R: epidemic diffusion model using two-dimensional non-linear least squares (nls)

Question

I'm trying to implement an epidemic diffusion model in R. The formula for the diffusion is delta_y=(a+b * y) * (N-y). y describes the current users in period t, N describes the number of potential users, delta_y describes the new users in t and a as well as b are parameters to be estimated. Note that y is the cumulative sum of all previous delta_y. For a single observation (with delta_y and y as vectors) the model simply works with:

model1 <- nls(delta_y ~ (a+b * y) * (N-y))

Now the problem is that I have a set of observations of this type and I want to estimate the same parameters a and b for all of them. I was trying to use the same formula from above but now delta_y and y are two-dimensional arrays instead of vectors. I receive the error in "qr.qty(QR, resid) : 'qr' and 'y' must have the same number of rows"

Details about the data: y as well as delta_y are two-dimensional arrays with 16 columns and 20103 rows. The arrays are created as following:

y=matrix(c(data$nearby_1998,data$nearby_1999, data$nearby_2000, ..., data$nearby_2013),nrow=20103)
invCum <- function (data) {result=matrix(nrow=nrow(data), ncol=ncol(data)); result[,1]=data[,1]; for (i in 2:ncol(data)) {result[,i] <- data[,i]-data[,i-1]}; return(result)}
delta_y <- invCum(y)

invCum is a function that returns the new users in t given the cumulated users in t (practically an inverse cumsum function).

str(y) delivers "int [1:20103, 1:16] 0 0 0 0 0 0 0 0 0 0 ...". str(delta_y) also delivers "int [1:20103, 1:16] 0 0 0 0 0 0 0 0 0 0 ...". Note that not all entries are 0, just many of the first entries.

The columns of the data each have 20103 entries. The model above worked for a single row of the data.

The asteristiks were just not displayed somehow, I edited the text so that they're visible. I constructed the array like y =matrix(c(data$1998,data$1999,data$2000,...,data$2013),nrow=2). Each column of the data has 20103 rows. — Lukas Stäcker, Jun 28 '15 at 15:36
Excuse me, I meant y=matrix(c(data$1998,data$1999,data$2000,...,data$2013),nrow=20103) — Lukas Stäcker, Jun 28 '15 at 15:43
Ok well you're right, my original column names are nearby_1998, nearby_1999 etc. I just wanted to give an example of how I created the arrays... — Lukas Stäcker, Jun 28 '15 at 16:06
Okay I edited a description of how I created the arrays. I also don't think our comments lead somewhere useful at the moment since I have perfectly working two-dimensional arrays but the problem is that the nls function doesn't seem to accept those. — Lukas Stäcker, Jun 28 '15 at 16:19
I agree with @BondedDust that you need to improve your question. However, have a look at `library(nlme); ?nlsList`. — Roland, Jun 28 '15 at 16:28
Please excuse me, I'm new to this board. I originally left away some infos not to overcomplicate things but I did my best now to make it a bit clearer. If you need even more information, please ask. — Lukas Stäcker, Jun 28 '15 at 17:00

score 2 · Accepted Answer · answered Jun 28 '15 at 19:40

After searching the Rhelp archives for that error and finding that a similar situation was solved by Duncan Murdoch by converting the matrices to "long"-form using as.vector() and reviewing the material on nls and nlsList in Pinheiro and Bates, I'm posting the results of some experiments that may be congruent with your data situation. If I understand correctly you have 16 different "runs" of observations of delta_y and y and your hope was to model them with the same non-linear model. What's not clear yet is whether you expect them all: (A) to have the same parameters or instead (B) expect coefficients to vary with merely the same form. Let's take the (A) case first, which is what the as.vector()-solution provided by Duncan Murdoch nine years ago would deliver.

newdf <- data.frame( d_y <- as.vector(delta_y), 
                     y = as.vector(y), 
                     grp=rep(letters[1:16], each=20103) )
N=   _____  # you need to add this; not sure if it's a constant or vector
            # if it varies across groups need to use the rep()-strategy to add to newdf
model1 <- nls( d_y ~ (a+b * y) * (N-y)  , data=newdf, start=list(a=0, b=1))

If on the other hand you want separate coefficients:

 library(nlme)
 model1 <- nlsList(delta_y ~ (a+b * y) * (N-y) | grp, data=newdf, start=c(a=0, b=1) )

Here is some testing: First on a single group (the example in ?nls):

DNase1 <- subset(DNase, Run == 1) 
> fm2DNase1 <- nls(density ~ 1/(1 + exp((xmid - log(conc))/scal)),
+                  data = DNase1,
+                  start = list(xmid = 0, scal = 1) )
> summary(fm2DNase1)
==================
Formula: density ~ 1/(1 + exp((xmid - log(conc))/scal))

Parameters:
     Estimate Std. Error t value Pr(>|t|)
xmid -0.02883    0.30785  -0.094    0.927
scal  0.45640    0.27143   1.681    0.115

Residual standard error: 0.3158 on 14 degrees of freedom

Number of iterations to convergence: 14 
Achieved convergence tolerance: 1.631e-06

Now done on all groups of that dataset without regard to group-ID:

> fm2DNase <- nls(density ~ 1/(1 + exp((xmid - log(conc))/scal)),
+                  data = DNase,
+                  start = list(xmid = 0, scal = 1) )
> summary(fm2DNase)
==========
Formula: density ~ 1/(1 + exp((xmid - log(conc))/scal))

Parameters:
     Estimate Std. Error t value Pr(>|t|)    
xmid -0.14816    0.09780  -1.515    0.132    
scal  0.46736    0.08691   5.377 2.41e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3291 on 174 degrees of freedom

Number of iterations to convergence: 13 
Achieved convergence tolerance: 7.341e-06

And finally on each group separately with the equation form remaining the only constant:

> fm2DNase <- nlsList(density ~ 1/(1 + exp((xmid - log(conc))/scal))|Run,
+                  data = DNase,
+                  start = list(xmid = 0, scal = 1) )
> summary(fm2DNase)
Call:
  Model: density ~ 1/(1 + exp((xmid - log(conc))/scal)) | Run 
   Data: DNase 

Coefficients:
   xmid 
      Estimate Std. Error     t value  Pr(>|t|)
10 -0.23467586  0.3527077 -0.66535499 0.4749505
11 -0.18717815  0.3522418 -0.53139112 0.5746396
9  -0.14742434  0.3459987 -0.42608348 0.6521089
1  -0.02882911  0.3403312 -0.08470898 0.9267180
4  -0.01243939  0.3351487 -0.03711604 0.9691708
8  -0.09549007  0.3408348 -0.28016525 0.7741478
5  -0.09216741  0.3367420 -0.27370331 0.7800695
7  -0.25657193  0.3613815 -0.70997535 0.4750054
6  -0.25052019  0.3564816 -0.70275765 0.5051072
2  -0.11218699  0.3245483 -0.34567120 0.7763199
3  -0.23007674  0.3433663 -0.67006203 0.5933597
   scal 
    Estimate Std. Error  t value  Pr(>|t|)
10 0.4904888  0.3148254 1.557971 0.1076081
11 0.4892928  0.3138277 1.559113 0.1139307
9  0.4723505  0.3075025 1.536087 0.1189793
1  0.4564003  0.3000630 1.521015 0.1148339
4  0.4423467  0.2946883 1.501066 0.1338825
8  0.4582587  0.3018498 1.518168 0.1352101
5  0.4473772  0.2980249 1.501140 0.1407799
7  0.5142468  0.3234251 1.590003 0.1224310
6  0.5007426  0.3185856 1.571768 0.1483103
2  0.4161636  0.2878193 1.445920 0.2457047
3  0.4654567  0.3062277 1.519969 0.2355130

Residual standard error: 0.3491304 on 154 degrees of freedom

Many thanks, seems like the as.vector()-solution above perfectly worked! — Lukas Stäcker, Jun 29 '15 at 07:16

R: epidemic diffusion model using two-dimensional non-linear least squares (nls)

1 Answers1