How to know from which interval of the input the features used in sktime's TimeSeriesForestClassifier are calculated

Question

I used the sktime library's TimeSeriesForestClassifier class to perform multivariate time series classification.

The code is as follows

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

from sktime.classification.compose import ColumnEnsembleClassifier
from sktime.classification.interval_based import TimeSeriesForestClassifier
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.compose import ColumnConcatenator

X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

steps = [
    ("concatenate", ColumnConcatenator()),
    ("classify", TimeSeriesForestClassifier(n_estimators=100)),
]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

I would like to check the value of feature_importances_, which is not the same length as the input, but an array with the same length as the number of features.

clf.steps[1][1].feature_importances_

I would like to know which part of the input each importance corresponds to. Is there any way to get information about which section of the input the TimeSeriesForestClassifier is calculating features from?

mloning · Accepted Answer · 2021-12-04T01:20:32.037

You can get the intervals (start and end index) for each tree of the ensemble from:

clf.steps[1][1].intervals_

sktime now also has an implementation of the newer Canonical Interval Forecast.

When we first implemented the Time Series Forest algorithm, we ended up with two versions. The one that you're using is the recommended one, but the older version provides its own functionality for the feature importance graph (see below).

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

from sktime.classification.compose import ColumnEnsembleClassifier
from sktime.classification.compose import ComposableTimeSeriesForestClassifier
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.compose import ColumnConcatenator

X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

steps = [
    ("concatenate", ColumnConcatenator()),
    ("classify", ComposableTimeSeriesForestClassifier(n_estimators=100)),
]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

clf.steps[-1][-1].feature_importances_.rename(columns={"_slope": "slope"}).plot(xlabel="time", ylabel="feature importance")

Be aware of some subtle issues in the calculation and interpretation of the feature importances. The relevant issues are here:

Thank you for your answer. I've tried versions 0.8.~0.4, but they all seem to give me errors. What version should I run? " ---> 15 ("classify", ComposableTimeSeriesForestClassifier(n_estimators=100))," "TypeError: Can't instantiate abstract class ComposableTimeSeriesForestClassifier with abstract methods _set_oob_score_and_attributes" — pie, Dec 04 '21 at 14:16
Strange, I ran this on the latest release (v0.8.1). Would appreciate if you'd raise a bug report on GitHub: https://github.com/alan-turing-institute/sktime/issues/new?assignees=&labels=bug&template=bug_report.md&title=%5BBUG%5D — mloning, Dec 04 '21 at 14:36

How to know from which interval of the input the features used in sktime's TimeSeriesForestClassifier are calculated

1 Answers1