0

I am trying to create linear regression model using graphlab. I have 200 samples and 1 predictor. But I encountered "numerical overflow error", Here is the output:

model_all = graphlab.linear_regression.create(data2.tail(200), target='output', features=['input'],validation_set=None,l2_penalty=0.0002,solver = 'auto')
Linear regression:
--------------------------------------------------------
Number of examples          : 200
Number of features          : 1
Number of unpacked features : 1
Number of coefficients    : 2
Starting Newton Method
--------------------------------------------------------
+-----------+----------+--------------+--------------------+---------------+
| Iteration | Passes   | Elapsed Time | Training-max_error | Training-rmse |
+-----------+----------+--------------+--------------------+---------------+
+-----------+----------+--------------+--------------------+---------------+
TERMINATED: Terminated due to numerical overflow error.
This model may not be ideal. To improve it, consider doing one of the following:
(a) Increasing the regularization.
(b) Standardizing the input data.
(c) Removing highly correlated features.
(d) Removing `inf` and `NaN` values in the training data

hint (b), (c) and (d) does not apply to my case because there is only 1 feature and there is no inf or NaN values. I have tried various l2_penalty but all no use. If I limite the number of samples to a smaller number such as 180, it will work.

model_all = graphlab.linear_regression.create(data2.tail(180), target='output', features=['input'],validation_set=None,l2_penalty=0.0002,solver = 'auto')
model_all.get("coefficients").print_rows(num_rows=100)
Linear regression:
--------------------------------------------------------
Number of examples          : 180
Number of features          : 1
Number of unpacked features : 1
Number of coefficients    : 2
Starting Newton Method
--------------------------------------------------------
+-----------+----------+--------------+--------------------+---------------+
| Iteration | Passes   | Elapsed Time | Training-max_error | Training-rmse |
+-----------+----------+--------------+--------------------+---------------+
| 1         | 2        | 0.000866     | 9.873043           | 4.272624      |
+-----------+----------+--------------+--------------------+---------------+
SUCCESS: Optimal solution found.
+----------------+-------+------------------+-------------------+
|      name      | index |      value       |       stderr      |
+----------------+-------+------------------+-------------------+
|  (intercept)   |  None |   9.3412783539   |   3.80166353756   |
| DOEDDIST.Index |  None | 0.00226165438702 | 0.000975084975224 |
+----------------+-------+------------------+-------------------+
[2 rows x 4 columns]

I don't understand what causes the numerical overflow error. Can someone help to explain?

Thank you.

Pollyanna
  • 11
  • 1
  • 1
  • 5
  • You can always chose some other solver available if solving this task is all you need. For debugging this, you probably should show the data, although your observation is strange indeed. – sascha Oct 11 '17 at 11:51
  • Thanks for the reply – Pollyanna Oct 11 '17 at 12:22

1 Answers1

0

I doubled checked my data and there is indeed an NaN entry. My bad. data.dropna(axis = 'index',how = 'any',inplace=True) solves it.

Pollyanna
  • 11
  • 1
  • 1
  • 5