-2

Given an array with [x, y] points, e.g.:

p0 = np.random.rand(21, 2)
p1 = np.array([[0, 0], [0, 1], [0, 2], [0, 3]])

I would like to get a straight line [segment] approximation of these points. Least squares method is acceptable, other error functions may be fine too. Note that the set of points in general is not a function, i.e. as shown above in p1, there are multiple y values associated with x = 0.


There is no slope, intercept solution for a vertical set of points, e.g. p1, therefore np.polyfit, scipy.stats.linregress are not solutions I'm looking for. Think geometry, not statistics.

Paul Jurczak
  • 7,008
  • 3
  • 47
  • 72
  • 2
    What would be a least-squares solution to this? Least squares produces a line, `y=f(x)`. What value would you hope to estimate for x=7? – Tim Roberts Feb 12 '22 at 05:09
  • @TimRoberts For `p1` the solution is `x=0`, i.e. for canonical line equation *ax+by+c=0*, *a=1, b=0, c=0* – Paul Jurczak Feb 12 '22 at 05:36
  • You can use [`scipy.stats.linregress`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html#scipy.stats.linregress) among other similar methods – Cory Kramer Feb 12 '22 at 21:01
  • @CoryKramer No, I can't. Slope/intercept solutions can't handle vertical lines. – Paul Jurczak Feb 12 '22 at 22:21

1 Answers1

0

Maybe I'm missing something, but since you're already using Numpy, why not just use its polyfit function?

def fit_line(points):
    r = np.polyfit([p[0] for p in points], [p[1] for p in points], 1)
    print(f"y = {r[0]:.2f}x + {r[1]:.2f}")

fit_line(np.random.rand(21, 2))
# fit_line(np.array([[0, 0], [0, 1], [0, 2], [0, 3]]))
fit_line(np.array([[0, 3], [1, 4], [2, 5], [3, 6]]))

Result:

y = 0.20x + 0.51  (different each run)
y = 1.00x + 3.00

As you note, the second example is a degenerate case with no slope-intercept form and so polyfit throws an exception. You could catch the exception to deal with that case. Unfortunately, the library also outputs a bunch of crap to stdout in the degenerate case, making catching the exception less than ideal by itself. For this reason, I chose to not demonstrate catching the exception but instead just commented out that case.

CryptoFool
  • 21,719
  • 5
  • 26
  • 44
  • As I stated in my question, handling of vertical line is required, so this solution doesn't work for me. – Paul Jurczak Feb 12 '22 at 22:19
  • @Paul - dealing with the vertical line is trivial. If an exception is thrown, then take the `x` value from any one of your points and produce `x=`. The problem with this is that it's still a degenerate case because it's not y as a function of x. If this isn't enough, then I guess I'm wondering what you think the form of a proper answer would look like. There is simply no way to represent both a purely vertical and a purely horizontal line with a single equation....period. – CryptoFool Feb 13 '22 at 01:02
  • Wrong. The canonical line equation, e.g. *ax+by+c=0* can represent ALL straight lines in 2D, including vertical ones. – Paul Jurczak Feb 13 '22 at 05:47
  • Additionally, `np.polyfit` produces a wrong result in cases like this: `[[0, 0], [0, 1], [0.1, 0], [0.1, 1]]` – Paul Jurczak Feb 13 '22 at 05:56
  • Oops..yeah...you're right...I shouldn't have said "equations". What I meant was that you can't define all lines in terms of a function that maps one coordinate to the other...c1=f(c2). If you can do something with 'ax+by+c=0', fantastic. – CryptoFool Feb 13 '22 at 06:52