I have a regression problem to estimate the slope of y = a*x+b
, and tried two different methods to a
. Method 1 estimates the mean of two data clusters as two points, based on which a
is calculated. Method 2 uses the standard regression equation.
import numpy as np
import statistics
# find the slope a of y = a*x + b
x = "28.693756 28.850006 28.662506 28.693756 28.756256 28.662506 28.787506 \
28.818756 28.818756 28.787506 28.787506 28.787506 28.693756 28.787506 \
28.818756 28.725006 28.725006 28.850006 28.756256 28.725006 28.881256 \
28.818756 28.756256 28.693756 28.756256 28.787506 28.693756 28.662506 \
28.662506 28.787506 28.850006 28.756256 28.725006 28.818756 28.600006 \
28.725006 28.725006 28.850006 28.881256 28.881256 28.818756 28.756256 \
28.756256 28.787506 28.787506 28.787506 28.756256 28.787506 28.725006 \
28.725006 28.725006 28.756256 28.818756 28.756256 28.693756 28.818756 \
28.756256 28.756256 28.693756 28.850006 28.631256 28.693756 28.693756 \
28.850006 28.756256 28.725006 28.693756 28.756256 28.850006 28.787506 \
28.600006 28.631256"
x = [float(t) for t in x.split()]
y = [33.8]*36 + [38.7]*36
print(" ")
print("Method 1 ")
x1, x2 = statistics.mean(x[:36]), statistics.mean(x[36:])
y1, y2 = statistics.mean(y[:36]), statistics.mean(y[36:])
slope = (y1-y2)/(x1-x2)
print(f"a = {slope}")
print(" ")
print('Method 2')
x = np.array(x)
y = np.array(y)
X = np.c_[np.ones(x.shape), x]
XXinv = np.linalg.inv(X.transpose().dot(X)).dot(X.transpose())
_beta = XXinv.dot(y)
iv = np.linalg.inv(X.transpose().dot(X)).tolist()
print(f"a = {_beta[1]}")
xx = X.transpose().dot(X)
svd = np.linalg.svd(xx)[1]
print(f"SVD(XX) = {svd}")
Results of the code are:
Method 1
a = 1128.9599999997959
Method 2
a = 1.2136744782028899
SVD(XX) = [5.96125150e+04 3.80959618e-04]
From the data plots, the line should be close to vertically linear, and method 1 result makes more sense than method 2. Also, even the line with smallest slope across the data (shown in figure) has a slope of 17.5. For normal cases, method 2 works well. However in this case, it gives such a small slope of 1.21 which doesn't make sense.
The only reason I can relate to is the near singularity as shown in the SVD values. But why? or any fix?