I am new to the world of machine learning in python and I have currently a lot of questions regarding the algorithms itself and the code.
So I have developed a python script that takes in time series data of a stock (timestamps and the Adj. Close). I pre-processed it by taking the log of the Adj. Close and normalize it using the min max approach and split the timestamps and Adj. Close into train and test set (90/10 split going forward). The data I am using is of 891 trading days.
I then apply grid search technique on the SVR to fit the train data of timestamps and Adj. Close.
svr = SVR(kernel='rbf', tol=0.001, C=1.0, shrinking=True, cache_size=200, verbose=False, max_iter=-1)
then the parameters to hunt for is gamma and epsilon.
grs = GridSearchCV(svr, param_grid , n_jobs=1, iid=False, cv=StratifiedKFold(train_y, shuffle=False), verbose=0)
fit_results = grs.fit(train_X, train_y)
par_found = fit_results.best_params_
svr = SVR(kernel='rbf', gamma = par_found['gamma'], tol=0.001, C=1.0, epsilon = par_found['epsilon'], shrinking=True, cache_size=200, verbose=False, max_iter=-1)
final_fit = svr.fit(train_X, train_y)
pred = svr.predict(test_X)
but when I print pred, I get an array where all the elements are a single number with an array length equal to the length of the test_X.
I feel like regression engine is over fitting the data but I am not sure. Also, what feature is best suited to use as X_data? I think that using timestamps is not a best way of doing things.
Also, Is there any good literature on understanding the concept and math behind Support vector regression on financial time series?
Is there any better regression technique than SVR for regressing time series?
Thanks. Your help will be greatly appreciated.