0

I'm trying to develop some code to find the significance of using an auxiliary data source to improve the predictability of a final product. I have the data ready in matlab, which is my preferred program for analysis.

I'm trying to solve the following equation.

P(t,i) = a(i) + b(i)*Z(t,i) + c(i)*Y(t,i) + d(i)*X(t,i) + e(i)*W(i)

Where, P, Z, Y, X, W are known, t and i are indices and I wish to find the values for a, b, c, d and e which minimise the difference between existing value of P and the predicted value of P.

t = 1:20 and i ~ 1:250000

Eventually I will set the value of e(i) to zero and see how much improvement I get from adding the extra variable, before testing with a random number stream too.

If more detail is needed I will try to provide it, many thanks.

I've tried the method suggested below however because my Z, Y and X values are matrices then the output matrix sol is 3 times the width of t + the one element of e. I've read further around and think the method should be one of either the Generalised linear model or the Panel regression model but I'm not sure how to set one up. I've re-read the examples from mathworks a few times and am still confused.

Ben
  • 1
  • 3

1 Answers1

0

You can calculate your coefficients using mldivide, where MATLAB will give you the least-squares solution for an overdetermined system. If I understand the question correctly, you want to calculate the coefficients for every i, so you have to iterate over i.

In code, it would look something like this (untested):

for i=1:250000
  M = [ones(size(P(:,i))), Z(:,i), Y(:,i), X(:,i), W(:,i)];
  sol = M\P(:,i);

  a(i) = sol(1);
  b(i) = sol(2);
  c(i) = sol(3);
  d(i) = sol(4);
  e(i) = sol(5);
end

You can find further information in the documentation.

dasdingonesin
  • 1,347
  • 1
  • 10
  • 16
  • Thanks for the idea. I've tried this, however there is an issue that sol is much larger than needed. This method produces values of a-e for each value of the index t whereas I only want one solution of a-e for all values of t. This is why I was thinking along the lines of regression. sol is a matrix of (3t+1,i) and not (5,i) which is what I'm after. I hope this makes sense. – Ben Apr 30 '15 at 07:39