7

I try to generate meta-features, so I follow tutorials and write the following:

clf = tree.DecisionTreeClassifier()

clf.fit(X, y)

But it raises ValueError.

File "/usr/local/lib/python2.7/dist-packages/sklearn/tree/tree.py", line 739, in fit
X_idx_sorted=X_idx_sorted)
File "/usr/local/lib/python2.7/dist-packages/sklearn/tree/tree.py", line 146, in fit
check_classification_targets(y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'unknown'

Why it raises?

The dataset consists of floats and integers, class labels are integers. describe() returns this:

         x1       x2       x3       x4      x5       x6       x7       x8  
count   3500.00  3500.00  3500.00  3500.00  3500.0  3500.00  3500.00  3500.00   
unique   501.00   516.00   572.00   650.00   724.0   779.00   828.00   757.00   
top        0.12     0.79     0.82     0.83     1.9     1.68     1.67     2.03   
freq      23.00    25.00    22.00    18.00    16.0    15.00    13.00    14.00   

         x9      x10  ...        x32      x33      x34      x35      x36  
count   3500.00  3500.00  ...    3500.00  3500.00  3500.00  3500.00  3500.00   
unique   730.00   676.00  ...     496.00   504.00   503.00   505.00   486.00   
top        3.27     3.47  ...       0.01     0.58    -0.27    -0.02     0.26   
freq      15.00    16.00  ...      23.00    24.00    26.00    23.00    24.00   

        x37      x38      x39      x40  class  
count   3500.00  3500.00  3500.00  3500.00   3500  
unique   488.00   490.00   492.00   506.00      3  
top       -0.03    -0.07     0.05    -0.19      1  
freq      23.00    25.00    22.00    24.00   1185  

Dataset looks like this:

       x33   x34   x35   x36   x37   x38   x39   x40 class  
0     -0.7  0.51  0.34 -0.13 -0.87  0.56 -0.53  0.29     2  
1     1.12   0.6  0.28  2.17  0.18 -0.09 -1.33     1     1  
2     -0.3 -0.07 -0.99 -0.75  1.11  1.35 -1.63   0.1     0  
3    -0.29 -1.62  0.19 -1.04  0.43 -1.82 -1.14 -0.23     1  
4    -0.78 -0.12 -0.35  0.44  0.31 -0.45 -0.23  0.27     0  
5     0.28  0.61  -0.4 -1.96  1.26 -0.72  2.01  0.95     2  
6     0.07  1.91 -0.15 -0.27   1.9  1.14 -0.05  0.04     0  
7     1.52 -1.52 -0.16 -0.41 -0.48 -0.37   0.8   1.3     2  
8    -0.52 -1.41 -3.49  1.74 -0.37 -0.25 -0.63   0.2     2  
9     0.78  0.09  -0.7  1.12 -0.32 -0.43 -0.34 -1.04     2  
10    0.25  0.29 -0.73 -0.02  2.14  1.49  0.02 -2.16     2  
11   -1.72 -0.09  0.43 -0.33 -1.66 -0.73  1.45  2.11     2  
12   -0.01 -2.63 -1.91  0.59   0.8  0.35  1.58 -0.98     2  

Its shape is [3500 rows x 41 columns].

evaleria
  • 1,281
  • 6
  • 23
  • 30

1 Answers1

8

There are two probable problem and solutions:
1. does your data has appropriate dimention? check it by X.shape() to insure your data is in appropriate format, you can also check this question
2. Try to convert your data to float by np.asarray(...,dtype=np.float64), you can also check this question

Community
  • 1
  • 1
Masoud
  • 1,343
  • 8
  • 25