SVR's Predict method in scikit-learn predicting only one number for the test set

Question

I am new to the world of machine learning in python and I have currently a lot of questions regarding the algorithms itself and the code.

So I have developed a python script that takes in time series data of a stock (timestamps and the Adj. Close). I pre-processed it by taking the log of the Adj. Close and normalize it using the min max approach and split the timestamps and Adj. Close into train and test set (90/10 split going forward). The data I am using is of 891 trading days.

I then apply grid search technique on the SVR to fit the train data of timestamps and Adj. Close.

svr = SVR(kernel='rbf', tol=0.001, C=1.0, shrinking=True, cache_size=200, verbose=False, max_iter=-1)

then the parameters to hunt for is gamma and epsilon.

grs = GridSearchCV(svr, param_grid , n_jobs=1, iid=False, cv=StratifiedKFold(train_y, shuffle=False), verbose=0)

fit_results = grs.fit(train_X, train_y)

par_found = fit_results.best_params_

svr = SVR(kernel='rbf', gamma = par_found['gamma'], tol=0.001, C=1.0, epsilon = par_found['epsilon'], shrinking=True, cache_size=200, verbose=False, max_iter=-1)

final_fit = svr.fit(train_X, train_y)

pred = svr.predict(test_X)

but when I print pred, I get an array where all the elements are a single number with an array length equal to the length of the test_X.

I feel like regression engine is over fitting the data but I am not sure. Also, what feature is best suited to use as X_data? I think that using timestamps is not a best way of doing things.

Also, Is there any good literature on understanding the concept and math behind Support vector regression on financial time series?

Is there any better regression technique than SVR for regressing time series?

Thanks. Your help will be greatly appreciated.

can you include a sample of `X` and `y` in your question? also you should get rid of the grid search until you can get your basic model working first! — maxymoo, Jul 28 '16 at 06:20
@maxymoo so my x is just time stamps and y is adj. Close. I think that X should be some other independent variable other than time but does adj. close a suitable candidate for y? following is a sample of x and y: train_X = array([['2004-01-02T00:00:00.000000000'], ['2004-01-05T00:00:00.000000000'], ['2004-01-06T00:00:00.000000000'], ..., ['2015-04-23T00:00:00.000000000'], ['2015-04-24T00:00:00.000000000']], dtype='datetime64[ns]') and y_train = array([ 0.60200086, 0.62414149, ..., 0.89205663]) — coderLane, Jul 30 '16 at 17:56
ok so SVR is really not the right kind of model for this kind of data, i have never worked with financial time series but a quick google "sklearn time series regression" gives results like http://stackoverflow.com/questions/20841167/how-to-predict-time-series-in-scikit-learn and https://www.quantstart.com/articles/Forecasting-Financial-Time-Series-Part-1 which might be a good place for you to start — maxymoo, Jul 31 '16 at 23:05

SVR's Predict method in scikit-learn predicting only one number for the test set

0 Answers0