0

learning classification in sktime

    from sklearn.model_selection import train_test_split
X = AUDCHF_h1_model[['Open','High','Low','Close','Volume','VWMA',
                                   'Minute','Hour','Day','Week','Month','Year']].values
y = AUDCHF_h1_model[['is_beg_leg']].values

X_train,X_test,y_train,y_test = train_test_split(
    X, y, test_size=0.2)

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(53250, 12) (53250, 1) (13313, 12) (13313, 1)

    import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

from sktime.classification.compose import ColumnEnsembleClassifier
from sktime.classification.dictionary_based import BOSSEnsemble
from sktime.classification.interval_based import TimeSeriesForestClassifier
#from sktime.classification.shapelet_based import MrSEQLClassifier
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.compose import ColumnConcatenator

    steps = [
    ("concatenate", ColumnConcatenator()),
    ("classify", TimeSeriesForestClassifier(n_estimators=100)),
]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

i receive

ValueError: Mismatch in number of cases. Number in X = 639000 nos in y = 53250

but

X_train.shape (53250, 12) y_train.shape (53250, 1)

who knows ?

jack
  • 13
  • 3

1 Answers1

0

Based on the information you provided, I can't say anything for sure, but I suspect that the problem is the ColumnConcatenator in your pipeline, which stacks all the columns of X to create a new univariate time series with 53250 * 12 = 639000 rows. This concatenated time series is then passed to the TimeSeriesForestClassifier and has a different shape than your original input. Depending on your use case, you can now either delete the "concatenated" step or you have to provide target values for the newly created univariate time series.