I try to generate meta-features, so I follow tutorials and write the following:
clf = tree.DecisionTreeClassifier()
clf.fit(X, y)
But it raises ValueError.
File "/usr/local/lib/python2.7/dist-packages/sklearn/tree/tree.py", line 739, in fit
X_idx_sorted=X_idx_sorted)
File "/usr/local/lib/python2.7/dist-packages/sklearn/tree/tree.py", line 146, in fit
check_classification_targets(y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'unknown'
Why it raises?
The dataset consists of floats and integers, class labels are integers. describe() returns this:
x1 x2 x3 x4 x5 x6 x7 x8
count 3500.00 3500.00 3500.00 3500.00 3500.0 3500.00 3500.00 3500.00
unique 501.00 516.00 572.00 650.00 724.0 779.00 828.00 757.00
top 0.12 0.79 0.82 0.83 1.9 1.68 1.67 2.03
freq 23.00 25.00 22.00 18.00 16.0 15.00 13.00 14.00
x9 x10 ... x32 x33 x34 x35 x36
count 3500.00 3500.00 ... 3500.00 3500.00 3500.00 3500.00 3500.00
unique 730.00 676.00 ... 496.00 504.00 503.00 505.00 486.00
top 3.27 3.47 ... 0.01 0.58 -0.27 -0.02 0.26
freq 15.00 16.00 ... 23.00 24.00 26.00 23.00 24.00
x37 x38 x39 x40 class
count 3500.00 3500.00 3500.00 3500.00 3500
unique 488.00 490.00 492.00 506.00 3
top -0.03 -0.07 0.05 -0.19 1
freq 23.00 25.00 22.00 24.00 1185
Dataset looks like this:
x33 x34 x35 x36 x37 x38 x39 x40 class
0 -0.7 0.51 0.34 -0.13 -0.87 0.56 -0.53 0.29 2
1 1.12 0.6 0.28 2.17 0.18 -0.09 -1.33 1 1
2 -0.3 -0.07 -0.99 -0.75 1.11 1.35 -1.63 0.1 0
3 -0.29 -1.62 0.19 -1.04 0.43 -1.82 -1.14 -0.23 1
4 -0.78 -0.12 -0.35 0.44 0.31 -0.45 -0.23 0.27 0
5 0.28 0.61 -0.4 -1.96 1.26 -0.72 2.01 0.95 2
6 0.07 1.91 -0.15 -0.27 1.9 1.14 -0.05 0.04 0
7 1.52 -1.52 -0.16 -0.41 -0.48 -0.37 0.8 1.3 2
8 -0.52 -1.41 -3.49 1.74 -0.37 -0.25 -0.63 0.2 2
9 0.78 0.09 -0.7 1.12 -0.32 -0.43 -0.34 -1.04 2
10 0.25 0.29 -0.73 -0.02 2.14 1.49 0.02 -2.16 2
11 -1.72 -0.09 0.43 -0.33 -1.66 -0.73 1.45 2.11 2
12 -0.01 -2.63 -1.91 0.59 0.8 0.35 1.58 -0.98 2
Its shape is [3500 rows x 41 columns].