I am creating different polynomial regression models, by passing different powers of same teaching feature.
So if I want a polynomial model of degree 3 of the feature 'x'. Then to the regression model, I am passing x^1,x^2 and x^3 as the features.
The following function is used to create an Sframe table of powers of 'x'. From the values of 'x' passed to it, along with the degree powers that need to be created.
def polynomial_sframe(feature, degree):
# assume that degree >= 1
# initialize the SFrame:
poly_sframe = graphlab.SFrame()
#poly_sframe['power_1'] equal to the passed feature
poly_sframe['power_1'] = feature
# first check if degree > 1
if degree > 1:
# then loop over the remaining degrees:
# range usually starts at 0 and stops at the endpoint-1.
for power in range(2, degree+1):
#give the column a name:
name = 'power_' + str(power)
# then assign poly_sframe[name] to the appropriate power of feature
poly_sframe[name] = feature.apply(lambda x: x**power)
return poly_sframe
Then using Sframe generated from the above function. I am able to generate different polynomial expression's for different degrees of X. As shown in the following code.
poly3_data = polynomial_sframe(sales['sqft_living'], 3)
my_features = poly3_data.column_names() # get the name of the features
poly3_data['price'] = sales['price'] # add price to the data since it's the target
model3 = graphlab.linear_regression.create(poly3_data, target = 'price', features = my_features, validation_set = None)
Graphlab is able to generate a model upto degree 4. After that if fails and for the following code. It will show that an overflow error has occurred.
poly15_data = polynomial_sframe(sales['sqft_living'], 5)
my_features = poly15_data.column_names() # get the name of the features
poly15_data['price'] = sales['price'] # add price to the data since it's the target
model15 = graphlab.linear_regression.create(poly15_data, target = 'price', features = my_features, validation_set = None)
---------------------------------------------------------------------------
ToolkitError Traceback (most recent call last)
<ipython-input-76-df5cbc0b6314> in <module>()
2 my_features = poly15_data.column_names() # get the name of the features
3 poly15_data['price'] = sales['price'] # add price to the data since it's the target
----> 4 model15 = graphlab.linear_regression.create(poly15_data, target = 'price', features = my_features, validation_set = None)
C:\Users\mk\Anaconda2\envs\dato-env\lib\site-
packages\graphlab\toolkits\regression\linear_regression.pyc in create(dataset, target, features, l2_penalty, l1_penalty, solver, feature_rescaling, convergence_threshold, step_size, lbfgs_memory_level, max_iterations, validation_set, verbose)
284 step_size = step_size,
285 lbfgs_memory_level = lbfgs_memory_level,
--> 286 max_iterations = max_iterations)
287
288 return LinearRegression(model.__proxy__)
C:\Users\mk\Anaconda2\envs\dato-env\lib\site-
packages\graphlab\toolkits\_supervised_learning.pyc in create(dataset, target, model_name, features, validation_set, verbose, distributed, **kwargs)
451 else:
452 ret = _graphlab.toolkits._main.run("supervised_learning_train",
--> 453 options, verbose)
454 model = SupervisedLearningModel(ret['model'], model_name)
455
C:\Users\mk\Anaconda2\envs\dato-env\lib\site-
packages\graphlab\toolkits\_main.pyc in run(toolkit_name, options, verbose, show_progress)
87 _get_metric_tracker().track(metric_name, value=1, properties=track_props, send_sys_info=False)
88
---> 89 raise ToolkitError(str(message))
ToolkitError: Exception in python callback function evaluation:
OverflowError('long too big to convert',):
Traceback (most recent call last):
File "graphlab\cython\cy_pylambda_workers.pyx", line 426, in graphlab.cython.cy_pylambda_workers._eval_lambda
File "graphlab\cython\cy_pylambda_workers.pyx", line 171, in graphlab.cython.cy_pylambda_workers.lambda_evaluator.eval_simple
File "graphlab\cython\cy_flexible_type.pyx", line 1193, in graphlab.cython.cy_flexible_type.process_common_typed_list
File "graphlab\cython\cy_flexible_type.pyx", line 1138, in graphlab.cython.cy_flexible_type._fill_typed_sequence
File "graphlab\cython\cy_flexible_type.pyx", line 1385, in graphlab.cython.cy_flexible_type._ft_translate
OverflowError: long too big to convert
Is this error because my computer lacks memory to compute the regression model? How would this error be fixed?