graphlab linear regression terminated due to numerical overflow error

Question

I am trying to create linear regression model using graphlab. I have 200 samples and 1 predictor. But I encountered "numerical overflow error", Here is the output:

model_all = graphlab.linear_regression.create(data2.tail(200), target='output', features=['input'],validation_set=None,l2_penalty=0.0002,solver = 'auto')
Linear regression:
--------------------------------------------------------
Number of examples          : 200
Number of features          : 1
Number of unpacked features : 1
Number of coefficients    : 2
Starting Newton Method
--------------------------------------------------------
+-----------+----------+--------------+--------------------+---------------+
| Iteration | Passes   | Elapsed Time | Training-max_error | Training-rmse |
+-----------+----------+--------------+--------------------+---------------+
+-----------+----------+--------------+--------------------+---------------+
TERMINATED: Terminated due to numerical overflow error.
This model may not be ideal. To improve it, consider doing one of the following:
(a) Increasing the regularization.
(b) Standardizing the input data.
(c) Removing highly correlated features.
(d) Removing `inf` and `NaN` values in the training data

hint (b), (c) and (d) does not apply to my case because there is only 1 feature and there is no inf or NaN values. I have tried various l2_penalty but all no use. If I limite the number of samples to a smaller number such as 180, it will work.

model_all = graphlab.linear_regression.create(data2.tail(180), target='output', features=['input'],validation_set=None,l2_penalty=0.0002,solver = 'auto')
model_all.get("coefficients").print_rows(num_rows=100)
Linear regression:
--------------------------------------------------------
Number of examples          : 180
Number of features          : 1
Number of unpacked features : 1
Number of coefficients    : 2
Starting Newton Method
--------------------------------------------------------
+-----------+----------+--------------+--------------------+---------------+
| Iteration | Passes   | Elapsed Time | Training-max_error | Training-rmse |
+-----------+----------+--------------+--------------------+---------------+
| 1         | 2        | 0.000866     | 9.873043           | 4.272624      |
+-----------+----------+--------------+--------------------+---------------+
SUCCESS: Optimal solution found.
+----------------+-------+------------------+-------------------+
|      name      | index |      value       |       stderr      |
+----------------+-------+------------------+-------------------+
|  (intercept)   |  None |   9.3412783539   |   3.80166353756   |
| DOEDDIST.Index |  None | 0.00226165438702 | 0.000975084975224 |
+----------------+-------+------------------+-------------------+
[2 rows x 4 columns]

I don't understand what causes the numerical overflow error. Can someone help to explain?

Thank you.

You can always chose some other solver available if solving this task is all you need. For debugging this, you probably should show the data, although your observation is strange indeed. — sascha, Oct 11 '17 at 11:51

score 0 · Answer 1 · answered Oct 11 '17 at 12:23

0

I doubled checked my data and there is indeed an NaN entry. My bad. data.dropna(axis = 'index',how = 'any',inplace=True) solves it.

answered Oct 11 '17 at 12:23

Pollyanna

11
1
1
5

graphlab linear regression terminated due to numerical overflow error

1 Answers1