0

enter image description here

So, for some reasons, It took my laptop to 16min to fit data into DecisionTreeClassifier. It usually take like 1 sec to fit into other type of machine learning model. Anyone can help me with what is happening here? I am not sure what information should I provide to help with this. Feel free to ask away!

My guess is it has to do with encoder transform syntax, which I have no idea how to fix from many online searches. It shows that my approach will lead to poor performance, but this syntax is from the library itself, so I do not know how to change the code inside. enter image description here

FastBoi
  • 163
  • 9

1 Answers1

0

Probably, your dataset has way more columns after encoding, which leads to poor performance and a long training time. You can check the columns of the dataset after encoding to be sure.

  • @ Hamid I have like 120+ columns after encoding. But, it still take like 1-2 seconds for other model, which also have like 100+ columns to run this fast. So far, this only happens to decision tree model. – FastBoi May 17 '22 at 06:47
  • The other model is a decision tree or something else? How about the performance of this trained model, is accuracy good enough? It is also possible that your model is overfitting and I think, limiting max_depth could help to decrease training time and also helps model to generalize better. – Hamid Askarov May 17 '22 at 07:12
  • The other model is linear and logistic regression. I think setting max_depth comes after fitting the model. Is plot tree what you are talking about? – FastBoi May 17 '22 at 08:14
  • 1
    You can set max_depth of decision tree while you creating model object, to be more precise, something like `model = DecisionTreeClassifier(random_state=42, max_depth=3)`. You can find more hyperparameters that you can set [here](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html). Try with limited max_depth, and let me know if it helps. – Hamid Askarov May 17 '22 at 08:22
  • I set it to max_depth=3, and the time goes down from 16 min to 11min. It is not much but it is something. – FastBoi May 17 '22 at 09:31
  • What about accuracy? How is it when you set max_depth to 3? Accuracy is more important than training time, you probably train it once, and generally, training takes time. You can also try to change the encoding method, but make sure the model's performance is not decreased. – Hamid Askarov May 17 '22 at 10:00
  • Well, initially, the training data has like 99% accuracy, but the validation data has like 79%. Now, the validation data improve to 83%, but the training data now is around 85%. I think overall it is better because it increases the accuracy in different set of data. – FastBoi May 17 '22 at 10:46
  • Well I think the training time is important because I know someone got like 1-2 sec using this method and mine takes very long time. – FastBoi May 17 '22 at 10:52
  • 1
    Alright, I found the solution to reduce the compute time. I was using online notebook called binder, and apparently I believe it does not have the high enough computing power or rather it was using my local computer as an engine. After switching to google Collab, the compute time is 3 sec. Much difference :D – FastBoi May 17 '22 at 11:11