1

I'm currently migrating from matlab to R, and trying to find out if what I want to do is possible.

I want to estimate a non-linear model in R where the observations are US states. The wrinkle is that one of the independent variables is a state-level index over counties, calculated using a parameter to be estimated, i.e. the model looks like this:

log(Y_s) = log(phi) + log(f(theta, X_cs)) + u_s

where Y_s is a state-level variable and X_cs is a vector containing county-level observations of a variable within the state, and f() returns a scalar value of the index calculated for the state.

So far I've tried using R's nls function while transforming the data as it's passed to the function. Abstracting from the details of the index, a simpler version of the code looks like this:

library(dplyr)

state <- c("AK", "AK", "CA", "CA", "MA", "MA", "NY", "NY")
Y <- c(3, 3, 5, 5, 6, 6, 4, 4)
X <- c(4, 5, 2, 3, 3, 5, 3, 7)
Sample <- data.frame(state, Y, X)

f <- function(data, theta) {
  output <- data %>%
    group_by(state) %>%
    summarise(index = mean(X**theta),
              Y = mean(Y))
}

model <- nls(Y ~ log(phi) + log(index),
             data = f(Sample, theta),
             start = list(phi = exp(3), theta = 1.052))

This returns an error, telling me that the gradient is singular. My guess is it's because R can't see how the parameter theta should be used in the formula.

Is there a way to do this using nls? I know I could define the criterion function to be minimised manually, i.e. log(Y_s) - log(phi) - log(f(theta, X_cs)), and use a minimisation routine to estimate the parameter values. But I want to use the postestimation features of nls, like having a confidence interval for the parameter estimates. Any help much appreciated.

s.willis
  • 347
  • 2
  • 11

1 Answers1

2

Sorry, I refuse to install that ginormous meta package. Thus, I use base R:

state <- c("AK", "AK", "CA", "CA", "MA", "MA", "NY", "NY")
Y <- c(3, 3, 5, 5, 6, 6, 4, 4)
X <- c(4, 5, 2, 3, 3, 5, 3, 7)
Sample <- data.frame(state, Y, X)

f <- function(X, state, theta) {
  ave(X, state, FUN = function(x) mean(x^theta))
}

model <- nls(Y ~ log(phi) + log(f(X, state, theta)),
             data = Sample, weights = 1/ave(X, state, FUN = length),
             start = list(phi = exp(3), theta = 1.052))
summary(model)
#Formula: Y ~ log(phi) + log(f(X, state, theta))
#
#Parameters:
#      Estimate Std. Error t value Pr(>|t|)
#phi   2336.867   4521.510   0.517    0.624
#theta   -2.647      1.632  -1.622    0.156
#
#Residual standard error: 0.7791 on 6 degrees of freedom
#
#Number of iterations to convergence: 11 
#Achieved convergence tolerance: 3.722e-06
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Thanks @Roland. I just edited to question to only include the package I use in this stripped-down example, and deleted the cross-posted version of the question. To make sure I understand how your solution works -- in calculating standard errors and things like that, the nls package thinks that `length(Y)` is the number of observations, rather than `length(unique(Y))`, am I right? – s.willis Mar 28 '17 at 14:37
  • I've included the weights in case you have different numbers of data points per state. I'd try calculating bootstrap standard errors (assuming you have a sufficient amount of data) if standard errors and p-values are important for this problem. – Roland Mar 28 '17 at 14:50
  • Ok thanks. Is there a specific reason you would use the bootstrap, other than just the general nonlinearity? I have observations on about 3000 counties grouped in 47 states. – s.willis Mar 29 '17 at 09:20