0

I am applying the CCR Data Envelopment Analysis model to benchmark between stock data. To do that I am running R code from a DEA paper published here. This document uses lpSolve to solve a linear problem. The documentation for lpSolve is here.

This document comes with step by step instructions on how to implement the model below in R.

Background info:

Data Envelopment Analysis (=DEA) is a data oriented approach to evaluating the performance of a set of entities called Decision Making Units which convert multiple data inputs into multiple data outputs.DEA is used for evaluating and comparing performances of decision making units (DMUs) by determining the relative efficiency of a productive unit by considering its closeness to an efficiency frontier.

We assume there are n DMUs to be evaluated. Each DMU consumes varying amounts of different inputs to produce different outputs. Specifically consumes amount of input and produces amount of output . We further assume that , ≥ 0 and that each DMU has at least one positive input and one positive output value. In the CCR model the objective is to determine the weights , using linear programming so as to maximize the composite efficiency of each decision making unit. This is defined by the ratio:

enter image description here

Which is then turned into a linear problem like so:

enter image description here

Application:

I am applying this efficiency model to stock data CAPM risk factors:

It is a 5 inputs - 1 output case

enter image description here

Here the objective is to find the “best” set of weights (1, 1…5) for each asset that is analyzed. The term “best” is used here to mean that the above ratio is maximized relative to all others when these weights are assigned to the inputs and outputs of all DMUs / Stocks (1...10).

Here is the raw sample z score data which has been standardized such that all values lie between 0 and 1.

> dput(testdfst)
structure(list(Name = structure(1:10, .Label = c("Stock1", "Stock2", 
"Stock3", "Stock4", "Stock5", "Stock6", "Stock7", "Stock8", "Stock9", 
"Stock10"), class = "factor"), Date = structure(c(14917, 14917, 
14917, 14917, 14917, 14917, 14917, 14917, 14917, 14917), class = "Date"), 
    `(Intercept)` = c(0.454991569278089, 1, 0, 0.459437188169979, 
    0.520523252955415, 0.827294243132907, 0.642696631099892, 
    0.166219881886161, 0.086341470900152, 0.882092217743293), 
    rmrf = c(0.373075150411683, 0.0349067218712968, 0.895550280607866, 
    1, 0.180151549474574, 0.28669170468735, 0.0939821798173586, 
    0, 0.269645291515763, 0.0900619760898984), smb = c(0.764987877309785, 
    0.509094491489323, 0.933653313048327, 0.355340700554647, 
    0.654000372286503, 1, 0, 0.221454091364611, 0.660571586102851, 
    0.545086931342479), hml = c(0.100608151187926, 0.155064872867367, 
    1, 0.464298576152336, 0.110803875258027, 0.0720803195598597, 
    0, 0.132407005239869, 0.059742053684015, 0.0661623383303703
    ), rmw = c(0.544512524466665, 0.0761995312858816, 1, 0, 0.507699534880555, 
    0.590607506295898, 0.460148690870041, 0.451871218073951, 
    0.801698199214685, 0.429094840372901), cma = c(0.671162426988512, 
    0.658898571758625, 0, 0.695830176886926, 0.567814542084284, 
    0.942862571603074, 1, 0.37571611336359, 0.72565234813082, 
    0.636762557753099), Returns = c(0.601347600017365, 0.806071701848376, 
    0.187500487065719, 0.602971876359073, 0.470386289298666, 
    0.655773224143057, 0.414258177255333, 0, 0.266112191477882, 
    1)), .Names = c("Name", "Date", "(Intercept)", "rmrf", "smb", 
"hml", "rmw", "cma", "Returns"), row.names = c("Stock1.2010-11-04", 
"Stock2.2010-11-04", "Stock3.2010-11-04", "Stock4.2010-11-04", 
"Stock5.2010-11-04", "Stock6.2010-11-04", "Stock7.2010-11-04", 
"Stock8.2010-11-04", "Stock9.2010-11-04", "Stock10.2010-11-04"
), class = "data.frame")

I now run the above data through the R code taken from the paper mentioned at the beginning. Note that intercept variable from the testdfst data frame is excluded from the calculation.

require(lpSolve)

dea_results <- list()

namesDMU <- testdfst[1]
inputs <- testdfst[c(4,5,6,7,8)]
outputs <- testdfst[9]

N <- dim(testdfst)[1] # number of DMU
s <- dim(inputs)[2] # number of inputs
m <- dim(outputs)[2] # number of outputs

f.rhs <- c(rep(0,N),1) # RHS constraints
f.dir <- c(rep("<=",N),"=") # directions of the constraints
aux <- cbind(-1*inputs,outputs) # matrix of constraint coefficients in (6)
for (i in 1:N) {
  f.obj <- c(0*rep(1,s),outputs[i,]) # objective function coefficients
  f.con <- rbind(aux, c(as.matrix(inputs[i,]), rep(0, m))) # add LHS of bTz=1
  results <-lp("max",f.obj,f.con,f.dir,f.rhs,scale=1,compute.sens=TRUE) # solve LPP
  multipliers <- results$solution # input and output weights
  efficiency <- results$objval # efficiency score
  duals <- results$duals # shadow prices
  if (i==1) {
    weights <- multipliers
    effcrs <- efficiency
    lambdas <- duals [seq(1,N)]
  } else {
    weights <- rbind(weights,multipliers)
    effcrs <- rbind(effcrs , efficiency)
    lambdas <- rbind(lambdas,duals[seq(1,N)])
  }
}

report <- as.data.frame(cbind(effcrs,weights))
colnames(report) <- c('efficiency',names(inputs),
                      names(outputs)) # header
rownames(report) <- namesDMU[,1]

Description of the code:

enter image description here

enter image description here

May be relevant to my problem: note the z >= 0 part? Correct me if I'm wrong but lpSolve must take this by default since this restriction is not directly included in the code? If I could correctly identify this part I could tweak it so that it solves my problem.

THE Problem:

This is the end result:

 > report

         efficiency rmrf      smb        hml       rmw       cma   Returns
Stock1   0.5674100    0 0.000000  0.1769187 0.0000000 1.4634319 0.9435640
Stock2   1.0000000    0 0.000000  0.0000000 0.7713486 1.4284803 1.2405844
Stock3   1.0000000    0 1.071061  0.0000000 0.0000000 7.4588210 5.3333195
Stock4   1.0000000    0 1.930968  0.0000000 0.7427269 0.4510419 1.6584521
Stock5   0.5218197    0 0.000000  0.2080023 0.0000000 1.7205486 1.1093429
Stock6   0.5498426    0 0.000000  9.3299443 0.0000000 0.3473408 0.8384645
Stock7   1.0000000    0 3.260381  0.0000000 0.0000000 1.0000000 2.4139536
Stock8   0.0000000    0 0.000000  0.0000000 2.2130199 0.0000000 0.0000000
Stock9   0.2756548    0 0.000000 11.5264372 0.0000000 0.4291132 1.0358592
Stock10  1.0000000    0 0.000000  0.1875005 0.0000000 1.5509620 1.0000000

The RMRF input has been assigned a weight of 0 in all cases. For this efficiency measure to be relevant in the context that I am applying it in, it is necessary that all 5 inputs are taken into consideration into each calculation.

The issue I am trying to solve is restricting the weights further. I want all 5 inputs to be incorporated into the problem so I would like to restrict the "v" weight like so:

0.2 <= V(n) / V(n+1) <=5

Which is the same as

1/5 <= V(n) / V(n+1) <= 5/1 

This way all weights should be restricted to being maximum 5 times larger than any other weight and also a min of 5 time smaller.

I have been checking each variable from the model and also tried a few different data sets. I also swapped the location of rmrf and smb in the data-frame in an attempt to see if it's the code that's at fault or the data.

        efficiency       smb      rmrf       hml      rmw        cma  Returns
Stock1   1.0000000 0.2540156 0.0000000 1.1812905 1.043781 0.03466956 1.000000
Stock2   1.0000000 1.2150960 0.0000000 0.5443910 2.854127 0.00000000 2.422061
Stock3   1.0000000 0.0000000 0.0000000 1.0000000 0.000000 1.48815164 1.104670
Stock4   1.0000000 1.2348830 0.0000000 0.0000000 4.088438 0.89216307 3.674087
Stock5   0.8151261 0.0000000 0.0000000 0.5921236 1.173539 0.61261329 1.231220
Stock6   0.1491688 0.0000000 0.0000000 2.8715984 1.262761 0.00000000 1.148347
Stock7   1.0000000 2.0599619 0.0000000 0.0000000 0.000000 1.03785902 1.520484
Stock8   1.0000000 0.8346388 0.1206426 0.0000000 1.862840 0.00000000 1.535768
Stock9   0.0000000 0.0000000 0.0000000 0.0000000 1.167498 0.00000000 0.000000
Stock10  0.8065724 0.0000000 0.6973118 2.9529776 1.318778 0.00000000 1.357774

After trying out different data-sets it seems like there are exceptions (like the one above) where RMRF is included. I am starting to believe that the answer to my problem could be very simple. The program finds it optimum to simply not include my RMRF input into the calculation. And the solution to this would be the additional restriction on the weights.

Here is an example of a solution with un-standardized inputs (entirely different distribution of numbers in the data)

           efficiency smb      rmrf hml rmw cma  Returns
Stock1           1   0 1.0775390   0   0   0 346.1943
Stock2           0   0 1.3576882   0   0   0   0.0000
Stock3           0   0 0.7443477   0   0   0   0.0000
Stock4           0   0 0.6879011   0   0   0   0.0000
Stock5           0   0 1.2115507   0   0   0   0.0000
Stock6           0   0 1.1305046   0   0   0   0.0000
Stock7           0   0 1.2472639   0   0   0   0.0000
Stock8           0   0 1.4552312   0   0   0   0.0000
Stock9           0   0 1.1937843   0   0   0   0.0000
Stock10          0   0 1.3569824   0   0   0   0.0000
double-beep
  • 5,031
  • 17
  • 33
  • 41
Alex Bădoi
  • 830
  • 2
  • 9
  • 24

1 Answers1

1

Addressing your second question first, by multiplying by V(n+1) the constraint 0.2 <= V(n) / V(n+1) can be transformed to:

0.2 * V(n+1) <= V(n)

Since this is a linear constraint, it can be added directly to the model. Similarly the other side of the inequality can be modeled as the linear constraint

V(n) <= 5 * V(n+1)

Regarding your first question, since you haven't really described the algorithm other than linking to a paper that links to another paper that describes the variables, you haven't given enough context for me at least to be able to answer. If you expected the rmrf coefficients to take positive values, you could add a constraint requiring the sum of the rmrf coefficients exceed some positive value that you select; this will force at least one of the coefficient to be positive.

josliber
  • 43,891
  • 12
  • 98
  • 133
  • Hi josilber, i edited the question and added more information on what the problem is actually trying to solve. I didn't realize i missed the main idea when i first wrote the question. Thanks for bringing it up. As for the extra restriction i would like to add, splitting it in two definitely helps. – Alex Bădoi Sep 26 '15 at 22:41
  • Hi josil, by trial and error i have found a few cases where my RMRF input is included in the calculation. So i am starting to believe the problem may not be the code, but it's simply optimum for the calculation to not include this variable at all. do you think that could be true? i updated the question again to include this info. thanks! – Alex Bădoi Sep 27 '15 at 09:25