4

I am trying to use the optim() function in R to solve a simple problem, but I am facing some problems on how to implement it:

 e=tot_obs/(sum(Var1)+sum(Var2)+sum(Var3)+sum(Var4))
 output=(Var1+Var2+Var3+Var4)*e

I know the total of the observations and all the variables.

# Fake datasets   
# Considering that this are the observations c(1000,250,78,0,0,90)

#Known data
total_observations=1418
var1=c(1,0.3,0.5,0.01,0.05,0.6)
var2=c(500,40,40,0,0,100)
var3=c(1,0.1,0.2,0,0.1,0)
var4=c(2,0.04,0.003,0.003,0,0.05)

#Function
e=total_observations/(sum(var1)+sum(var2)+sum(var3)+sum(var4))
output=(var1+var2+var3+var4)*e

I can do a simple correlation between observations and output, with good results (~0.90). This one gives me 0.97.

But now I want to test the effect of having different weights assign to each variable.

e=tot_obs/(sum(w1*Var1)+sum(w2*Var2)+sum(w3*Var3)+sum(w4*Var4))
output=(w1*Var1+w2*Var2+w3*Var3+w4*Var4)*e
where w1+w2+w3+w4=1
and cor(observations,output)~1

I was trying to use optim() function, however I am completely lost. If anyone could help me out or point me some good references on how to do this, I would appreciate.

Gago-Silva
  • 1,873
  • 4
  • 22
  • 46
  • If you could give an example of your `Var1` to `Var4` and of `observations` it will be easier to help you, as their type will probably guide the way to proceed. – plannapus Sep 12 '13 at 09:04
  • Wait so `ouput=(Var1+Var2+Var3+Var4) * (observations/(Var1+Var2+Var3+Var4))` meaning `output=observations`? – plannapus Sep 12 '13 at 09:07
  • exactly @plannapus . although the output will never be equal to the observations, since there will always some errors or another variable missing. – Gago-Silva Sep 12 '13 at 09:08
  • So the `Var1` to `Var4` in the computation of `e` and the ones in the computation of `output` are not the same then? – plannapus Sep 12 '13 at 09:12
  • @plannapus I didn't explained correctly my problem, I have added fake data and corrected the description of the problem. Indeed for the observations I only know the total, not each value, and using the variables (for them I know each value), I want to arrive to values that are close the observations. – Gago-Silva Sep 12 '13 at 09:29

1 Answers1

5

You need to use function solnp in package Rsolnpbecause it allows constraints based on an equality.

The idea is to build a function to minimize and your equality function for the constraint.

Fun <- function(param){
    e <- total_observations/(sum(param[1]*var1)+sum(param[2]*var2)+sum(param[3]*var3)+sum(param[4]*var4))
    output <- (param[1]*var1 + param[2]*var2 + param[3]*var3 + param[4]*var4)/e
    -cor(output, observations) #We want to maximize cor and therefore minimize -cor
    }

eqn <- function(param){sum(param)}

With your example data:

observations <- c(1000,250,78,0,0,90)
total_observations=1418
var1=c(1,0.3,0.5,0.01,0.05,0.6)
var2=c(500,40,40,0,0,100)
var3=c(1,0.1,0.2,0,0.1,0)
var4=c(2,0.04,0.003,0.003,0,0.05)

Your optimization:

solnp(c(.1,.2,.3,.4),fun=Fun, eqfun=eqn, eqB=1)

Iter: 1 fn: -0.9793  Pars:  0.1395748 0.0008403 0.3881053 0.4714796
Iter: 2 fn: -0.9793  Pars:  0.1395531 0.0008406 0.3881409 0.4714653
solnp--> Completed in 2 iterations
$pars
[1] 0.1395530843 0.0008406453 0.3881409239 0.4714653466

$convergence
[1] 0

$values
[1] -0.9729894 -0.9793458 -0.9793458

$lagrange
             [,1]
[1,] 2.521018e-06

$hessian
           [,1]        [,2]        [,3]        [,4]
[1,]  0.4843670   5.0498894 -0.08329380  0.39560040
[2,]  5.0498894 699.5317385 -2.38763807 -0.65610831
[3,] -0.0832938  -2.3876381  0.91837245 -0.09486495
[4,]  0.3956004  -0.6561083 -0.09486495  0.43979850

$ineqx0
NULL

$nfuneval
[1] 709

$outer.iter
[1] 2

$elapsed
Time difference of 0.2371149 secs

If you save that into a variable res, what you're looking for is stored in res$pars.

plannapus
  • 18,529
  • 4
  • 72
  • 94
  • You modified your question as I was posting my answer to show that Var1 to Var4 were in fact vectors of numeric. You will have to modify this code accordingly. – plannapus Sep 12 '13 at 09:32
  • Thanks for accepting this answer but I realized this is wrong as the condition is actually `ui %*% param -ci >= 0`. I'll edit as soon as i'll find a solution. – plannapus Sep 12 '13 at 09:51
  • no worries, i am using your answer as a guide on how to implement it. you can edit once you find a solution. – Gago-Silva Sep 12 '13 at 09:53
  • 1
    @A.R. here you go! In fact you needed to use function `solnp` in package `Rsolnp` because this one allows constraint based on an equality while for `constrOptim` it needed to be an inequality. – plannapus Sep 12 '13 at 10:12
  • Great answer! Can you please take a look at this question? https://stackoverflow.com/questions/68280857/r-x-probs-outside-0-1 – stats_noob Jul 07 '21 at 19:33