0

I'm new to R and would appreciate your assistance. I have an optimization problem with a constraint. Although there are several ways to solve an optimization in R, I could not express my problem correctly with the constraint I need to apply.

Suppose I have the following data in three categories:

A<-c(99.1,  96.5,   94.4,   92.7,   91.5,   91.3,   91.4,   90.1,   87.1,   82.6,   76.4)
B<-c(146.4, 140.2,  133.6,  126.5,  118.7,  109.4,  101.2,  101.8,  103.7,  102.5,  98.3)
C<-c(237.5, 213.9,  191,    168.9,  147.4,  124.9,  108.3,  95.7,   84.4,   73.5,   63)
t<-seq(1:11)
DT<-cbind.data.frame(t,A,B,C)

I would like to fit an exponential function y(t) to data points in each of the categories (minimizing the squared errors), so that y(t)_c > y(t)_b > y(t)_a > 0 for selected t [1;15]

smci
  • 32,567
  • 20
  • 113
  • 146
Walle
  • 11
  • 3

2 Answers2

0

I wouldn't try to incorporate the constraint into regression. Just construct three separate regressions:

fit_loga <- lm(y ~ log(A) + t, data=DT) 
fit_logb <- lm(y ~ log(B) + t, data=DT) 
fit_logc <- lm(y ~ log(C) + t, data=DT)

fit_a <- exp(A)
fit_b <- exp(B)
fit_c <- exp(C)

then verify that they satisfy the constraint everywhere on that range (or at least, at every integer datapoint): (fit_c > fit_b) & (fit_b > fit_a). Only if not, then we worry about it. like throw in some other terms in t into the model, perhaps exp(t), I(t^2), poly(t, <order>)...

EDIT: I was unaware of solnp package.

smci
  • 32,567
  • 20
  • 113
  • 146
  • Thanks for your reply smci! I was thinking to make it in two steps like you suggest, but how would you do that in this case when C clearly intersects B and A? I am afraid that i need to keep the same assumption of curve form (here: exponential) for all the curves. Therefore, parameters of one would affect the parameters of another, which requires optimization to be run on them all at once. I completely agree that you should rather look for parameters of linear curves in log space, than trying to fit exponential curves. – Walle Oct 21 '16 at 10:29
  • Oh crap. Well if they intersect then how do we apply that constraint? Without doing something kludgy like clamping using min/max? – smci Oct 21 '16 at 10:34
  • 1
    I thought that the problem could be solved with an optimization function like {solnp()} that can handle inequality constrains. However, it does not converge or results in infeasible solution when applied directly to minimize squared errors with given constrains – Walle Oct 22 '16 at 22:58
  • @Walle: I don't know `solnp`. Suggest you add your code and the infeasible error to the question details and title. Also, that would be more on-topic over at [CrossValidated.SE](https://stats.stackexchange.com) – smci Oct 23 '16 at 00:30
  • 1
    It turned out to be the formatting issues with `solnp()` as the function expected all the parameters to be estimated, as input in one single vector. Therefore I had to transform the matrix I had to a vector and it worked. Not really clear from the error messages that one is getting while running the function, i.e. `non-numeric argument to binary operator` for infeasible constrains (too tight). Not sure if i should post the code or just close this subject. – Walle Oct 24 '16 at 22:28
  • @Walle: glad you solved it. Please do post your code and add your own answer. Yes, R error messages are often cryptic and poorly-documented. If you can suggest to the `solnp` authors how to improve that error message and/or documentation, by all means please do (both here and in a docbug or on their list), that's the only way things till get improved. – smci Oct 25 '16 at 19:40
0

I figured that the error messages that one gets using solnp mostly refer to inadequate constrains. Also, as it is stated in the documentation, it is required to put all the parameters in one vector. After appropriate adjustments to the code, I was able to implement my constrains of y(t)_c > y(t)_b > y(t)_a > 0 directly, without any need for altering the problem. The most convenient way is to use matrix formation for the solver. Using the data above, I've got the following: Results shown here

library(data.table)
library(Rsolnp)

t<-as.vector(10:20)
DT<-cbind.data.frame(A,B,C)
tlogDT<-transpose(log(DT))

# min[log(y)'- Ax-b]
# Arr = [A1 A2 .. An b1 b2 .. bn]

gofn = function(arrin)
{
  arr = cbind(arrin[1:3],arrin[4:6])
  sum(
    (as.matrix(arr[,1])%*%t + arr[,2] - tlogDT)^2
  )
}

nocross=t #defines the times where the curves are not allowed to intersect
ineqfn2=function(arrin)
{
  #constrains:
  # 0<f_a(t)<f_b(t)<f_c(t), for some t,
  arr = cbind(arrin[1:3],arrin[4:6])
  nextarr=as.matrix(rbind(rep(0,2),arr[1:(length(arr[,1])-1),]))
  ineqmat=as.matrix(arr[,1])%*%nocross+arr[,2]-nextarr[,1]%*%nocross-nextarr[,2]
  as.vector(t(ineqmat))
}

#lines should be aligned with the first startvalue
eqfn = function(arrin)
{  
  arr = cbind(arrin[1:3],arrin[4:6])
  arr[,1]*t[1]+arr[,2]-tlogDT[,1]
}
#start values:
An=c(1,1,1)
bn=tlogDT[,1]
vstart=c(An,bn)

r <- solnp(
  vstart,  gofn,

  eqfun = eqfn, eqB= c(0,0,0),

  ineqfun = ineqfn2,
  ineqLB = rep(0,length(DT[1,])*length(nocross)),
  ineqUB = rep(5000,length(DT[1,])*length(nocross))
)

r$pars[1]
line1 = exp(r$pars[4]+r$pars[1]*t)
line2 = exp(r$pars[5]+r$pars[2]*t)
line3 = exp(r$pars[6]+r$pars[3]*t)

plot(t, DT[,3],log = "y")
points(t, DT[,2],col="green")
points(t, DT[,1],col="blue")

lines(t, line1,lwd=2, col = "blue",  xlab = "Time (s)", ylab = "Counts")
lines(t, line2,lwd=2, col = "green",  xlab = "Time (s)", ylab = "Counts")
lines(t, line3,lwd=2, col = "black",  xlab = "Time (s)", ylab = "Counts")
Walle
  • 11
  • 3