-1

I am trying to use auto sklearn for some pandas data, and when i run:

model.fit(X_train, y_train)

this error pops up:

ValueError                                Traceback (most recent call last)
<ipython-input-10-ed5cd6b32087> in <module>
      2 #         sklearn.model_selection.train_test_split(X, y, random_state=1)
      3 
----> 4 model.fit(X_train, y_train)
~/notebook/jupyterenv/lib/python3.6/site-packages/autosklearn/estimators.py in fit(self, X, y, X_test, y_test, feat_type, dataset_name)
    660                              "".format(
    661                                     target_type,
--> 662                                     supported_types
    663                                 )
    664                              )
ValueError: Classification with data of type continuous is not supported. Supported types are ['binary', 'multiclass', 'multilabel-indicator']. You can find more information about scikit-learn data types in: https://scikit-learn.org/stable/modules/multiclass.html

my (X,y) data looks something like this: (the headers HOMO/LUMO etc. are descriptors)

HOMO (A)  HOMO (AH)  LUMO (A)  LUMO (AH)  charge (AH)  Charge metal (A)  \
0     -7.8453    -9.6920   -4.2406    -6.9161            1            -0.938   
1     -7.7330    -9.6774   -4.0690    -6.9602            1            -0.911   
2     -7.6751    -9.6051   -3.9238    -6.8990            1            -0.950   
3     -8.1345    -9.8027   -6.3221    -7.5155            1            -0.868   
4     -7.9405    -9.4709   -5.7324    -6.9515            1            -0.880   
..        ...        ...       ...        ...          ...               ...   
164   -7.5867    -9.7576   -5.1992    -6.8152            1            -0.312   
165   -8.3700   -10.1670   -6.6819    -7.8044            1            -0.311   
166   -8.3445   -10.0288   -6.6499    -7.5991            1            -0.321   
167   -7.9764   -10.0586   -6.3554    -7.5688            1            -0.277   
168   -7.9317    -9.9008   -6.3104    -7.3790            1            -0.288   

    
[169 rows x 17 columns] [24.4 23.8 24.  14.2 22.5 18.5 19.4 17.4 22.6 16.3 20.3 13.2 16.5 21.2
 24.6 17.3 23.3 22.2 18.  31.1 29.7 30.4 22.  23.2 22.1 27.6 22.9 19.8
 18.3 18.5 44.8 39.4 46.  49.9 35.  22.5 32.  22.8 38.1 23.6 23.3 18.4
 15.6 11.3 13.3 13.9 16.1 20.8 23.  20.4  8.3 11.3 11.4 15.1 15.4 17.1
 18.7 21.1 26.6 23.  20.4 21.6 26.8  9.  11.4 32.7 -1.6 -0.3 -1.3 -0.4
 -3.9  1.   5.6  0.5  0.   4.5  6.8  7.8  4.2  1.1  4.2  5.5  0.8 12.
 17.   5.8 17.  26.1 27.2 31.9  0.5  1.5  8.5  7.1 25.5 40.  -5.7 -6.
 12.5  4.4 -5.  -1.3 -5.  -5.  -5.  -5.  -0.6 -0.6  2.   3.6  3.2  0.1
  2.1  4.5 11.   2.7  3.5 -2.   1.2  9.3  2.6  7.1  6.1  3.2  5.1  7.5
  1.8  4.3  4.4  0.8  9.9  7.6  7.9  8.9 10.  10.9 11.8  9.9 13.4 13.4
  8.8  2.1  6.   7.1 -1.1  0.5  0.3  4.7  6.   6.5  8.  11.6  6.9  8.4
  8.7  7.2  6.3  6.4  7.4 12.1 10.4 11.1 12.2 14.3 16.3  8.1  8.5  8.6
  9. ]
desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • @Adept how do i tell if my variables are continuous? Sorry for the newbie question, I am just starting out in learning this field. – pzxpzxpzx11 Mar 17 '22 at 11:46
  • I kindly suggest you spend some time with a ML tutorial (there are literally hundreds available online). Classification is defined only for *discrete* variables - for continuous variables we use *regression* models. – desertnaut Mar 17 '22 at 12:55

1 Answers1

1

As the error explains, you're giving continuous variables to a model only handling binary or multiclass ones. What's the model ? You should check the doc to see how it works / what it handles or don't

-- FOLLOWING THE COMMENT

A continuous variable is a variable taking an infinite possibility (or really high) of numerous values (here is quite a good example since you have floats with 4 decimals, so it's obviously continuous). Hence, binary will be '1' or '0', and categorical would be a finite number of features (like 'January', 'February', ... , 'December', so only 12 possible categories). Many kind of models handle continuous variables (some ONLY want categorical variables), so if you don't have any constraint on your model, you can definitely switch to one of this kind.

Adept
  • 522
  • 3
  • 16