3

I'm not sure if this is the best place to ask this, but you guys have been helpful with plenty of my CS homework in the past so I figure I'll give it a shot.

I'm looking for an algorithm to blindly combine several dependent variables into an index that produces the best linear fit with an external variable. Basically, it would combine the dependent variables using different mathematical operators, include or not include each one, etc. until an index is developed that best correlates with my external variable.

Has anyone seen/heard of something like this before? Even if you could point me in the right direction or to the right place to ask, I would appreciate it. Thanks.

Denys Séguret
  • 372,613
  • 87
  • 782
  • 758
BoldlyBold
  • 31
  • 2
  • Sounds a little like a [Piecewise Linear Function](http://en.wikipedia.org/wiki/Piecewise_linear_function), or some other form of [curve fitting](http://en.wikipedia.org/wiki/Curve_fitting). – Robert Harvey Jun 29 '12 at 15:16
  • You're honestly better off asking this on http://math.stackexchange.com this is a very math heavy question. – Hans Z Jun 29 '12 at 15:30
  • Sure, I'll give it a go. As for some context, I'm developing an index that correlates the concentration of individual chemical compounds with air temperature. I've been working with manual regression analysis, but thought something that could blindly combine variables would be an interesting place to look as well. – BoldlyBold Jun 29 '12 at 15:38

3 Answers3

1

Sounds like you're trying to do Multivariate Linear Regression or Multiple Regression. The simplest method (Read: less accurate) to do this is to individually compute the linear regression lines of each of the component variables and then do a weighted average of each of the lines. Beyond that I am afraid I will be of little help.

Hans Z
  • 4,664
  • 2
  • 27
  • 50
0

This appears to be simple linear regression using multiple explanatory variables. As the implication here is that you are using a computational approach you could do something as simple apply a linear model to your data using every possible combination of your explanatory variables that you have (whether you want to include interaction effects is your choice), choose a goodness of fit measure (R^2 being just one example) and use that to rank the fit of each model you fit?? The quality of a model is also somewhat subjective in many fields - you could reject a model containing 15 variables if it only moderately improves the fit over a far simpler model just containing 3 variables. If you have not read it already I don't doubt that you will find many useful suggestions in the following text :

Draper, N.R. and Smith, H. (1998).Applied Regression Analysis Wiley Series in Probability and Statistics

You might also try doing a google for the LASSO method of model selection.

mathematician1975
  • 21,161
  • 6
  • 59
  • 101
0

The thing you're asking for is essentially the entirety of regression analysis.

this is what linear regression does, and this is a good portion of what "machine learning" does (machine learning is basically just a name for more complicated regression and classification algorithms). There are hundreds or thousands of different approaches with various tradeoffs, but the basic ones frequently work quite well.

If you want to learn more, the coursera course on machine learning is a great place to get a deeper understanding of this.

Eric
  • 3,142
  • 17
  • 14