I'm new to machine learning and I'm trying to predict the topic of an article given a labeled datasets that each contains all the words in one article. There are 11 different topics total and each article only has single topic. I have built a process pipeline:
classifier = Pipeline([
('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(XGBClassifier(objective="multi:softmax", num_class=11), n_jobs=-1)),
])
I'm trying to implement a GridsearchCV to find the best hyperparameters:
parameters = {'vectorizer__ngram_range': [(1, 1), (1, 2),(2,2)],
'tfidf__use_idf': (True, False)}
gs_clf_svm = GridSearchCV(classifier, parameters, n_jobs=-1, cv=10, scoring='f1_micro')
gs_clf_svm = gs_clf_svm.fit(X, Y)
This works fine, however, how do I tune the hyperparameters of XGBClassifier? I have tried using the notation:
parameters = {'clf__learning_rate': [0.1, 0.01, 0.001]}
It doesn't work because GridSearchCV is looking for the hyperparameters of OneVsRestClassifier. How to actually tune the hyperparameters of XGBClassifier? Also, what hyperparameters are you suggesting worth tuning for my problem?