If you are able to use the pandas library, this problem is very easy to express:
import pandas
import numpy as np
x = pandas.Series(np.arange(1,5))
y = pandas.Series(np.arange(5,9))
exponents = exponents = np.linspace(1, 9, 100)
r2s = {n:pandas.ols(x=x, y=y**n).r2 for n in exponents}
max(r2s.iteritems(), key=lambda x: x[1])
#>>> (1.0, 1.0)
Breaking this down:
- the
pandas.Series
object is an indexed column of data. It's like a numpy
array, but with extra features. In this case, we only care about it because it is something we can pass to pandas.ols
.
pandas.ols
is a basic implementation of least-squares regression. You can do this directly in numpy with numpy.linalg.lstsq
, but it won't directly report the R-squared values for you. To do it with pure numpy, you'll need to get the sum of squared residuals from numpy's lstsq
and then perform the formulaic calculations for R-squared manually. You could write this as a function for yourself (probably a good exercise).
- The stuff inside the
{..}
is a dict
comprehension. It will iterate over the desired exponents, perform the ols
function for each, and report the .r2
attribute (where the R-squared statistic is stored) indexed by whatever exponent number was used to get it.
- The final step is to call
max
on a sequence of the key-value pairs in r2s
, and key
tells max that it is the second element (the R-squared) by which elements are compared.
An example function to do it with just np.linalg.lstsq
is here (good explanation for calculating R2 in numpy):
def r2(x, y):
x_with_intercept = np.vstack([x, np.ones(len(x))]).T
coeffs, resid = np.linalg.lstsq(x_with_intercept, y)[:2]
return 1 - resid / (y.size * y.var())[0]
Then in pure numpy the above approach:
import numpy as np
x = np.arange(1,5)
y = np.arange(5,9)
exponents = np.linspace(1, 9, 100)
r2s = {n:r2(x=x, y=y**n) for n in exponents}
max(r2s.iteritems(), key=lambda x: x[1])
#>>> (1.0, 1.0)
As a final note, there is a fancier way to specify getting the 1-positional item from something. You use the built-in library operator
and the callable itemgetter
:
max(..., key=operator.itemgetter(1))
The expression itemgetter(1)
results in an object that is callable -- when it is called on an argument r
it invokes the __getitem__
protocol to result in r[1]
.