Python Linear Regression Error

Question

I have two arrays with the following values:

>>> x = [24.0, 13.0, 12.0, 22.0, 21.0, 10.0, 9.0, 12.0, 7.0, 14.0, 18.0,
...      1.0, 18.0, 15.0, 13.0, 13.0, 12.0, 19.0, 13.0]

>>> y = [10.0, 9.0, 22.0, 7.0, 4.0, 7.0, 56.0, 5.0, 24.0, 25.0, 11.0, 2.0,
...      9.0, 1.0, 9.0, 12.0, 9.0, 4.0, 2.0]

I used the scipy library to calculate r-squared:

>>> from scipy.interpolate import polyfit
>>> p1 = polyfit(x, y, 1)

When I run the code below:

>>> yfit = p1[0] * x + p1[1]
>>> yfit
array([], dtype=float64)

The yfit array is empty. I don't understand why.

This is the value of ConnectionTimeHoursRounded + p1[1]: [ 44.23076923 33.23076923 32.23076923 42.23076923 41.23076923 30.23076923 29.23076923 32.23076923 27.23076923 34.23076923 38.23076923 21.23076923 38.23076923 35.23076923 33.23076923 33.23076923 32.23076923 39.23076923 33.23076923] — Chiel, Jun 15 '16 at 11:17
When I use the following int arrays it still doesn't work: [24, 13, 12, 22, 21, 10, 9, 12, 7, 14, 18, 1, 18, 15, 13, 13, 12, 19, 13] [10, 9, 22, 7, 4, 7, 56, 5, 24, 25, 11, 2, 9, 1, 9, 12, 9, 4, 2] — Chiel, Jun 15 '16 at 11:18
When I try to calculate r-squared I get this error: http://imgur.com/udtGCpI. Sorry for posting a screen, but stackoverflow doesn't allow me to post this here. — Chiel, Jun 15 '16 at 11:26
Okay thanks for your tips, can you please tell me what's wrong with my example? For everyone: please don't just downvote my answer without telling what's wrong.. — Chiel, Jun 15 '16 at 11:43
Read all the tips. We can't reproduce your problem. Basically we're debugging for you but it becomes a chat and a conversation which isn't appropriate for the comments section. You tell a story with unnecessary details. It takes work to strip a problem down to its essence. Someone has to do the work, and the best person to do that is you. — Peter Wood, Jun 15 '16 at 11:47
Thank you for being specific! I editted my quesiton in a way so that I completely skip the CSV part (which is unrelevant and hard to reproduce). I also renamed the varriables in a more clear and content unspeciffic way. People should now be able to reproduce and learn from this problem. — Chiel, Jun 15 '16 at 11:58
Get rid of everything unrelated to `yfit = p1[0] * x+ p1[1]`. The plotting, the later questions. Just make it so we can reproduce that single result `print(yfit)`. — Peter Wood, Jun 15 '16 at 12:04

Peter Wood · Accepted Answer · 2016-06-19T11:42:58.337

The problem is you are performing scalar addition with an empty list.

The reason you have an empty list is because you try to perform scalar multiplication with a python list rather than with a numpy.array. The scalar is converted to an integer, 0, and creates a zero length list.

We'll explore this below, but to fix it you just need your data in numpy arrays instead of in lists. Either create it originally, or convert the lists to arrays:

>>> x = numpy.array([24.0, 13.0, 12.0, 22.0, 21.0, 10.0, 9.0, 12.0, 7.0, 14.0,
...                  18.0, 1.0, 18.0, 15.0, 13.0, 13.0, 12.0, 19.0, 13.0]

An explanation of what was going on follows:

Let's unpack the expression yfit = p1[0] * x + p1[1].

The component parts are:

>>> p1[0]
-0.58791208791208893

p1[0] isn't a float however, it's a numpy data type:

>>> type(p1[0])
<class 'numpy.float64'>

x is as given above.

>>> p1[1]
20.230769230769241

Similar to p1[0], the type of p1[1] is also numpy.float64:

>>> type(p1[0])
<class 'numpy.float64'>

Multiplying a list by a non-integer interpolates the number to be an integer, so p1[0] which is -0.58791208791208893 becomes 0:

>>> p1[0] * x
[]

as

>>> 0 * [1, 2, 3]
[]

Finally you are adding the empty list to p[1], which is a numpy.float64.

This doesn't try to append the value to the empty list. It performs scalar addition, i.e. it adds 20.230769230769241 to each entry in the list.

However, since the list is empty there is no effect, other than it returns an empty numpy array with the type numpy.float64:

>>> [] + p1[1]
array([], dtype=float64)

An example of a scalar addition having an effect:

>>> [10, 20, 30] + p1[1]
array([ 30.23076923,  40.23076923,  50.23076923])

Nice answer! But what is the fix? I assume defining `x = np.array([...])` and same with `y` will solve the problem, and be a clean solution? — Martin Hallén, Jun 16 '16 at 09:13
@mart0903 I don't know what the expression `p1[0] * x + p1[1]` is meant to achieve so I don't know how to fix it. I have just explained why it's producing an empty array. — Peter Wood, Jun 16 '16 at 09:17
@mart0903 No, it won't fix it. The problem is multiplying an array by a decimal fraction which gets rounded down to zero. I don't know the meaning of it. — Peter Wood, Jun 16 '16 at 09:18
Ii seems like it does. Multiplying a decimal (or a numpy float in this case) with a numpy array is perfectly fine. The example works with numpy arrays :) — Martin Hallén, Jun 16 '16 at 09:25
@mart0903 Ah, I see now. Yes. Thanks. I will improve the answer when I have a moment. — Peter Wood, Jun 16 '16 at 09:27

Python Linear Regression Error

1 Answers1