0

I want to test which one of two non-nested models that I fit using stats4::mle in R provides a better fit using the Vuong and the Clarke test.

Here is a small portion of the data I am fitting, two different models (function "w" differs) and the according mle()'s:

library(stats4)
### Data 
z1 <- c(0.1111111, 0.1037037, 0.1222222, 0.1111111, 0.1074074, 0.1666667, 0.1333333, 0.2000000, 0.1333333, 0.1074074,
        0.1037037, 0.1111111, 0.1333333, 0.2000000, 0.1222222, 0.1111111, 0.1666667, 0.1333333, 0.1111111, 0.1333333,
        0.1111111, 0.1666667, 0.1074074, 0.1333333, 0.1222222, 0.2000000, 0.1037037)

z2 <- c(0.08888889, 0.06666667, 0.07777778, 0.00000000, 0.03333333, 0.09259259, 0.09629630, 0.08888889, 0.06666667,
        0.03333333, 0.06666667, 0.08888889, 0.06666667, 0.08888889, 0.07777778, 0.00000000, 0.09259259, 0.09629630,
        0.00000000, 0.09629630, 0.08888889, 0.09259259, 0.03333333, 0.06666667, 0.07777778, 0.08888889, 0.06666667)

p <-  c(0.5, 0.9, 0.5, 0.9, 0.9, 0.1, 0.1, 0.1, 0.5, 0.9, 0.9, 0.5, 0.5, 0.1, 0.5, 0.9, 0.1, 0.1, 0.9, 0.1, 0.5, 0.1, 0.9, 0.5, 0.5, 0.1, 0.9)

zce <- c(0.11055556, 0.10277778, 0.11000000, 0.10833333, 0.10185185, 0.11666667, 0.13240741, 0.14166667, 0.13166667,
         0.07222222, 0.08796296, 0.09944444, 0.09500000,0.10833333, 0.09444444, 0.05277778, 0.10925926, 0.11759259,
         0.05833333, 0.10277778, 0.09277778, 0.10925926, 0.06111111, 0.08833333, 0.09222222, 0.12500000, 0.09166667)


### Functions:
u <- function(x,n) 
{
  ifelse(n!=1,util <- x^(1-n)/(1-n), util <- log(x))
  return(util)
}
u.inv <- function(x,n)
{
  ifelse(n !=1, inv.util <- ((1-n)*(x))^(1/(1-n)), inv.util <- exp(x))
  return(inv.util)
}

v = function(x,n){return(1/(u(maxz,n)-u(minz,n))*(u(x,n)-u(minz,n)))}
v.inv = function(x,n){return(u.inv(x*(u(maxz,n)-u(minz,n))+u(minz,n),n))}

maxz = 135
minz = 0


### model 1
w <- function(p,a,b){return(exp(-b*(-log(p))^(1-a)))}

### Loglikelihood 1
LL1 <- function(n,a,b,s)
{
  V = (v(z1,n)-v(z2,n))*w(p,a,b) + v(z2,n) 
  res = zce - v.inv(V,n)
  ll = dnorm(res, 0, s,log=T)
  ll.fin1 <<- ll ### record ll per datapoint given optimal parameters
  return(-sum(ll))
}

### mle 1
fit.model1 <- mle(LL1,
           start = list(n = 0.1,a=0.1,b=0.1,s=0.1),
           method = "L-BFGS-B",
           lower = list(n=-Inf,a = -Inf, b = 0.0001, s=0.0001),
           upper = list(n=0.9999,a = 0.9999, b = Inf, s=Inf),
           control = list(maxit = 500, ndeps = rep(0.000001,4)),
           nobs=length(z1))


######################

### model 2
w <- function(p,a,b){return((b*p^a)/(b*p^a+(1-p)^a))}

### Loglikelihood 2
LL2 <- function(n,a,b,s)
{
  V = (v(z1,n)-v(z2,n))*w(p,a,b) + v(z2,n) 
  res = zce - v.inv(V,n)
  ll = dnorm(res, 0, s,log=T)
  ll.fin2 <<- ll ### record ll per datapoint given optimal parameters
  return(-sum(ll))
}

### mle 2
fit.model2 <- mle(LL2,
                  start = list(n = 0.1,a=0.1,b=0.1,s=0.1),
                  method = "L-BFGS-B",
                  lower = list(n=-Inf,a = 0.0001, b = 0.0001, s=0.0001),
                  upper = list(n=0.9999,a = Inf, b = Inf, s=Inf),
                  control = list(maxit = 500, ndeps = rep(0.000001,4)),
                  nobs=length(z1))
shofla
  • 43
  • 6
  • Questions just asking for package recommendations can be considered off-topic. Describe what exactly you need to do and if a pacakge is required, one will be suggested, but don't assume a simple calculation can't be done in R to give you what you need. For this question to stand on it's own you should define the "Vuong/Clarke test" here (a careful description or at least a reference). What exactly do you need to calculate and where are you getting stuck? – MrFlick Mar 02 '20 at 21:55
  • Also, editing in comments to your question asking to justify donwvotes isn't exactly a good practice. Downvotes are anonymous and there's no way to be sure it will get back to the relevant person. That type of discussion might be a better fit for the meta-stackoverflow site: https://meta.stackoverflow.com/ (though i'm not 100% sure about that). A comment is still probably better than post-scripts edited into the question. – MrFlick Mar 02 '20 at 22:02
  • @MrFlick thanks a lot for your help, I totally see what you mean. I hope the edited question is better. – shofla Mar 03 '20 at 06:26

1 Answers1

1

With help from the author of the "games" package, which includes the Vuong and Clarke tests for ultimatum games, I was able to code the tests for my MLE optimisation.

Importantly, one should record the loglikelihoods of each data-point at the optimum parameter set, which can be done by saving "ll" in a global variable. The values this vector take on the final iteration of the optimisation process is exactly the required vector which can easily be checked by summing it and comparing this with logLike(fit).

With the fits and individual densities the Vuong and Clarke test can then be defined.

### Define tests
vuong.test <- function(fit1,fit2,ldens1,ldens2,alternative) 
{
  if(nobs(fit1) != nobs(fit2)){return("Error: number of observations used to fit the models unequal")}
  n.obs <- nobs(fit1)
  n.par1 <- nrow(coef(summary(fit1)))
  n.par2 <- nrow(coef(summary(fit2)))
  
  LR <- sum(ldens1) - sum(ldens2) - (n.par1-n.par2)/2*log(n.obs)
  LDdif <- sqrt(sum((ldens1 - ldens2)^2)/n.obs)
  z = LR/(sqrt(n.obs)*LDdif)
  
  alternative <- substr(alternative,1,1) ## alternative specifies whether you test whether fit1 is lesser, greater or equal to fit2.
  if(alternative == "l"){pval = 1-pnorm(z,mean = 0,sd = 1)}
  if(alternative == "g"){pval = pnorm(z,mean = 0,sd = 1)}
  if(alternative == "t"){pval = 2*pnorm(-abs(z),mean = 0,sd = 1)}
  
  return(pval)
}

clarke.test <- function(fit1,fit2,ldens1,ldens2,alternative)
{
  if(nobs(fit1) != nobs(fit2)){return("Error: number of observations used to fit the models unequal")}
  n.obs <- nobs(fit1)
  n.par1 <- nrow(coef(summary(fit1)))
  n.par2 <- nrow(coef(summary(fit2)))
  
  correction <- (n.par1-n.par2)*(log(n.obs)/(2*n.obs))
  z = sum(ldens1 - ldens2 > correction)
  
  alternative <- substr(alternative,1,1) ## alternative specifies whether you test whether fit1 is lesser, greater or equal to fit2.
  if(alternative == "l"){pval = 1-pbinom(q = z,size=n.obs,prob = 0.5)}
  if(alternative == "g"){pval = pbinom(q = z,size=n.obs,prob = 0.5)}
  if(alternative == "t"){zz <- min(z,n.obs-z); pval = 2*pbinom(q = zz,size=n.obs,prob = 0.5)}
  
  return(pval)
}

where:

  • fit1 = the saved fit of the first model using stats4::mle
  • fit2 = the saved fit of the second model using stats4::mle
  • ldens1 = the vector of loglikelihoods at the optimal parameter set of the first model.
  • ldens2 = the vector of loglikelihoods at the optimal parameter set of the second model.
  • alternative = test with "t"/"l"/"g" whether it can be rejected that model 1 is equidistant/further/closer to the true data-generating process than model 2.

Given a third model:

##### Model 3
w3 <- function(p,a,b){return(p)}
w <- w3

### LL3
LL3 <- function(n,s)
{
  V = (v(z1,n)-v(z2,n))*w(p,a,b) + v(z2,n) 
  res = zce - v.inv(V,n)
  ll = dnorm(res, 0, s,log=T)
  ll.fin3 <<- ll
  -sum(ll)
}

### mle 3
fit.model3 <- mle(LL3,
                  start = list(n = 0.1,s=0.1),
                  method = "L-BFGS-B",
                  lower = list(n=-Inf,s=0.0001),
                  upper = list(n=0.9999, s=Inf),
                  control = list(maxit = 500, ndeps = rep(0.000001,2)),
                  nobs=length(z1))

Would reject equidistance based on Clarke:

> clarke.test(fit1 = fit.model1, ldens1 = ll.fin1,
+             fit2 = fit.model3, ldens2 = ll.fin3,
+             alternative = "t")
[1] 0.01915729

Games package: https://github.com/brentonk/games/blob/master/games/R/nonnest.r

shofla
  • 43
  • 6