Questions tagged [grid-search]

In machine learning, grid search refers to multiple runs to find the optimal value of parameter(s)/hyperparameter(s) of a model, e.g. mtry for random-forest or alpha, beta, lambda for glm, or C, kernel and gamma for SVM.

865 questions
6
votes
0 answers

Print from within Joblib Parallel function in Jupyter notebook

Is it possible to print things or debug when using Parallel in a Jupyter notebook. Here is my code import pandas as pd from sklearn.model_selection import ParameterGrid from joblib import Parallel, delayed def my_func(a,b): print("hi") …
blissweb
  • 3,037
  • 3
  • 22
  • 33
6
votes
0 answers

GridSearchCV - How to limit memory usage

I am performing grid search with GridSearchCV (scikit-learn) on Spark and Linux. For this reason, I am running nohup ./spark_python_shell.sh > output.log & at my bash shell to ignite the Spark cluster and I also get my python script running (see…
Outcast
  • 4,967
  • 5
  • 44
  • 99
6
votes
1 answer

How to specify positive label when use precision as scoring in GridSearchCV

model = sklearn.model_selection.GridSearchCV( estimator = est, param_grid = param_grid, scoring = 'precision', verbose = 1, n_jobs = 1, iid = True, cv = 3) In…
Hachiko
  • 83
  • 2
  • 5
6
votes
2 answers

How does GridSearchCV compute training scores?

I'm having a hard time figuring out parameter return_train_score in GridSearchCV. From the docs: return_train_score : boolean, optional        If False, the cv_results_ attribute will not include training scores. My question is: what are the…
Tonechas
  • 13,398
  • 16
  • 46
  • 80
6
votes
2 answers

GridSearchCV - access to predicted values across tests?

Is there a way to get access to the predicted values calculated within a GridSearchCV process? I'd like to be able to plot the predicted y values against their actual values (from the test/validation set). Once the grid search is complete, I can…
tmn103
  • 319
  • 1
  • 5
  • 16
6
votes
1 answer

How many combinations will GridSearchCV run for this?

Using sklearn to run a grid search on a random forest classifier. This has been running for longer than I thought, and I am trying to estimate how much time is left for this process. I thought the total number of fits it would do would be 3*3*3*3*5…
6
votes
2 answers

sample_weight parameter shape error in scikit-learn GridSearchCV

Passing the sample_weight parameter to GridSearchCV raises an error due to incorrect shape. My suspicion is that cross validation is not capable of handling the split of sample_weights accordingly with the dataset. First part: Using sample_weight as…
6
votes
0 answers

Nested GridSearchCV

For a given model type, I want to both 1) tune parameters for various model types and 2) find the best tuned model type. I would like to use GridSearchCV for this. I was able to run the following, but I am also concerned that this is not working…
mgoldwasser
  • 14,558
  • 15
  • 79
  • 103
6
votes
2 answers

Model help using Scikit-learn when using GridSearch

As part of the Enron project, built the attached model, Below is the summary of the steps, Below model gives highly perfect scores cv = StratifiedShuffleSplit(n_splits = 100, test_size = 0.2, random_state = 42) gcv = GridSearchCV(pipe,…
6
votes
1 answer

"Parallel" pipeline to get best model using gridsearch

In sklearn, a serial pipeline can be defined to get the best combination of hyperparameters for all consecutive parts of the pipeline. A serial pipeline can be implemented as follows: from sklearn.svm import SVC from sklearn import decomposition,…
Oblomov
  • 8,953
  • 22
  • 60
  • 106
6
votes
1 answer

GridSearchCV does not give the same results as expected when compared to xgboost.cv

when comparing sklearn.GridSearchCV with xgboost.cv I get different results...below I explain what I would like to do: 1) import libraries import numpy as np from sklearn import datasets import xgboost as xgb from sklearn.model_selection import…
gabboshow
  • 5,359
  • 12
  • 48
  • 98
6
votes
1 answer

How to properly merge outputs from models in the ensemble?

I am trying to figure out how to properly create regression ensembles. I know there are various options. I use the following approach. First I define models like Linear Regression, GBM, etc. Then I run GridSearchCV for each of these models to know…
Klausos Klausos
  • 15,308
  • 51
  • 135
  • 217
5
votes
1 answer

Can you get all estimators from an sklearn grid search (GridSearchCV)?

I recently tested many hyperparameter combinations using sklearn.model_selection.GridSearchCV. I want to know if there is a way to call all previous estimators that were trained in the process. search = GridSearchCV(estimator=my_estimator,…
Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76
5
votes
1 answer

How to determine best parameters and best score for each scoring metric in GridSearchCV

I am trying to evaluate multiple scoring metrics to determine the best parameters for model performance. i.e., to say: To maximize F1, I should use these parameters. To maximize precision, I should use these parameters. I am working off the…
artemis
  • 6,857
  • 11
  • 46
  • 99
5
votes
1 answer

LightGBM error : ValueError: For early stopping, at least one dataset and eval metric is required for evaluation

I am trying to train a LightGBM with gridsearch, I get the below error when I try to train model. ValueError: For early stopping, at least one dataset and eval metric is required for evaluation I have provided validation dataset and evaluation…
deep
  • 91
  • 1
  • 2
  • 8