TypeError: 'TimeSeriesSplit' object is not iterable

Question

I am carrying out a gridsearch for a SVR desigh which has a time series split. My code is:

from sklearn.svm import SVR
from sklearn.grid_search import GridSearchCV
from sklearn.model_selection import TimeSeriesSplit
from sklearn import svm
from sklearn.preprocessing import MinMaxScaler
from sklearn import preprocessing as pre

X_feature = X_feature.reshape(-1, 1)
y_label = y_label.reshape(-1,1)

param = [{'kernel': ['rbf'], 'gamma': [1e-2, 1e-3, 1e-4, 1e-5],
                       'C': [1, 10, 100, 1000]},
                       {'kernel': ['poly'], 'C': [1, 10, 100, 1000], 'degree': [1, 2, 3, 4]}] 


reg = SVR(C=1)
timeseries_split = TimeSeriesSplit(n_splits=3)
clf = GridSearchCV(reg, param, cv=timeseries_split, scoring='neg_mean_squared_error')


X= pre.MinMaxScaler(feature_range=(0,1)).fit(X_feature)

scaled_X = X.transform(X_feature)


y = pre.MinMaxScaler(feature_range=(0,1)).fit(y_label)

scaled_y = y.transform(y_label)



clf.fit(scaled_X,scaled_y )

My data for scaled y is:

 [0.11321139]
 [0.07218848]
 ...
 [0.64844211]
 [0.4926122 ]
 [0.4030334 ]]

And my data for scaled X is:

[[0.2681013 ]
 [0.03454225]
 [0.02062136]
 ...
 [0.92857565]
 [0.64930691]
 [0.20325924]]

However, I am getting the error message

TypeError: 'TimeSeriesSplit' object is not iterable

My traeback error message is:

  ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-4403e696bf0d> in <module>()
     19 
     20 
---> 21 clf.fit(scaled_X,scaled_y )

~/anaconda3_501/lib/python3.6/site-packages/sklearn/grid_search.py in fit(self, X, y)
    836 
    837         """
--> 838         return self._fit(X, y, ParameterGrid(self.param_grid))
    839 
    840 

~/anaconda3_501/lib/python3.6/site-packages/sklearn/grid_search.py in _fit(self, X, y, parameter_iterable)
    572                                     self.fit_params, return_parameters=True,
    573                                     error_score=self.error_score)
--> 574                 for parameters in parameter_iterable
    575                 for train, test in cv)
    576 

~/anaconda3_501/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    777             # was dispatched. In particular this covers the edge
    778             # case of Parallel used with an exhausted iterator.
--> 779             while self.dispatch_one_batch(iterator):
    780                 self._iterating = True
    781             else:

~/anaconda3_501/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
    618 
    619         with self._lock:
--> 620             tasks = BatchedCalls(itertools.islice(iterator, batch_size))
    621             if len(tasks) == 0:
    622                 # No more tasks available in the iterator: tell caller to stop.

~/anaconda3_501/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __init__(self, iterator_slice)
    125 
    126     def __init__(self, iterator_slice):
--> 127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 

~/anaconda3_501/lib/python3.6/site-packages/sklearn/grid_search.py in <genexpr>(.0)
    573                                     error_score=self.error_score)
    574                 for parameters in parameter_iterable
--> 575                 for train, test in cv)
    576 
    577         # Out is a list of triplet: score, estimator, n_test_samples

TypeError: 'TimeSeriesSplit' object is not iterable

Im not sure why this could be, I suspect this is happening when I am fitting in the last line. Help with this would be appreciated.

@desertnaut I have added the changes and "pre" is the pre processing that I am doing in the data. — Asif.Khan, Jul 04 '18 at 11:50
@desertnaut sorry they we're ment to just be my X & y. i have updated this. — Asif.Khan, Jul 04 '18 at 11:53
Can you make sure that you are using the updated version of sklearn ? — Gambit1614, Jul 04 '18 at 12:30
@MohammedKashif Yes I believe I am. For the gridsearch I am using "from sklearn.model_selection import GridSearchCV" as suggested below. — Asif.Khan, Jul 04 '18 at 12:35
If you used `from sklearn.model_selection import GridSearchCV` the stack trace is not compatible. Either the stack trace you shown is old, or you are not using the `model_selection`. — Vivek Kumar, Jul 04 '18 at 12:37

score 1 · Accepted Answer · answered Jul 04 '18 at 11:57

1

First thing, you are using incompatible packages. grid_search is old version which is now deprecated and does not work with model_selection.

In place of:

from sklearn.grid_search import GridSearchCV

Do this:

from sklearn.model_selection import GridSearchCV

Secondly, You only need to send TimeSeriesSplit(n_splits=3) to the cv param. Like this:

timeseries_split = TimeSeriesSplit(n_splits=3)
clf = GridSearchCV(reg, param, cv=timeseries_split, scoring='neg_mean_squared_error')

No need to call split(). It will be internally called by grid_search.

answered Jul 04 '18 at 11:57

Vivek Kumar

35,217
8
109
132

Thank you for your reply, I have added the changes but I am getting the error "TypeError: 'TimeSeriesSplit' object is not iterable". Any idea on why this could be? – Asif.Khan Jul 04 '18 at 12:06
@MohammedKashif Hi, yes I have updated the traceback error message. – Asif.Khan Jul 04 '18 at 12:15
@Asif.Khan I told you to replace the import which you did not do as seen in the new stack trace. Do that too – Vivek Kumar Jul 04 '18 at 12:34
@VivekKumar yes, I have done this now, and it is compiling. Fingers crossed this works. I will get back to you with this as it is taking forever. – Asif.Khan Jul 04 '18 at 12:38
1

@Asif.Khan You can add `verbose=5` in `GridSearchCV(...)` to check the progress of process. – Vivek Kumar Jul 04 '18 at 12:42
@VivekKumar I guess I shouldn't add this line whilst the code is running? I don't want to re run the code as it has been compiling and still have not finished – Asif.Khan Jul 04 '18 at 13:06
@Asif.Khan Yes. Do it for next time. Dont stop now. But why is the code taking so long to run. How much large is your data? – Vivek Kumar Jul 04 '18 at 13:12
@VivekKumar I have 17,000 bits of data in my data set.. Is there anyway I can get this to run faster? Thank you by the way! – Asif.Khan Jul 04 '18 at 14:04

score 0 · Answer 2 · answered Jul 04 '18 at 12:00

0

Length of generators can not be found, they don't contain complete information to find length, these maintain only current state. In your grid_search.py file line 579, it is trying to find length of generator. you need to convert them to iterators to find the length, so you can do:

n_folds = list(n_folds)

before you do:

n_folds = len(cv)

if you want to keep it as generator refer:

How to len(generator())

answered Jul 04 '18 at 12:00

dilkash

562
3
15

thank you for your reply, but would it be possible to get an example code on how to do this? – Asif.Khan Jul 04 '18 at 12:07

TypeError: 'TimeSeriesSplit' object is not iterable

2 Answers2