Questions tagged [scikit-learn]

Scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining, with a focus on machine learning. It is accessible to everybody and reusable in various contexts. It is built on NumPy and SciPy. The project is open source and commercially usable (BSD license).

scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining. It is built on NumPy, SciPy, and matplotlib. The project is open source and commercially usable (BSD license).

Resources

Related Libraries

sklearn-pandas - bridge library between scikit-learn and pandas
scikit-image - scikit-learn-compatible API for image processing and computer vision for machine learning tasks
sklearn laboratory - scikit-learn wrapper that enables running larger scikit-learn experiments and feature sets
sklearn deap - scikit-learn wrapper that enables hyper parameter tuning using evolutionary algorithms instead of gridsearch in scikit-learn
hyperopt-sklearn - Hyper-parameter optimization for sklearn
scikit-plot - visualization library for quickly generating common plots in machine learning studies
sklearn-porter - library for turning trained scikit-learn models into compiled c, java, or javascript code
sklearn_theano - scikit-learn-compatible objects (estimators, transformers, and datasets) using theano internally
sparkit-learn - scikit-learn API that uses pyspark's distributed computing model
joblib - scikit-learn parallelization library

28024 questions

votes

5 answers

Sklearn Pipeline: Get feature names after OneHotEncode In ColumnTransformer

I want to get feature names after I fit the pipeline. categorical_features = ['brand', 'category_name', 'sub_category'] categorical_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='constant', fill_value='missing')), …

python scikit-learn pipeline

asked Feb 12 '19 at 09:27

ResidentSleeper

2,385
2
10
20

votes

2 answers

What is the difference between pipeline and make_pipeline in scikit-learn?

I got this from the sklearn webpage: Pipeline: Pipeline of transforms with a final estimator Make_pipeline: Construct a Pipeline from the given estimators. This is a shorthand for the Pipeline constructor. But I still do not understand when I…

python machine-learning scikit-learn pipeline

asked Nov 20 '16 at 18:56

Aizzaac

3,146
8
29
61

votes

4 answers

classifiers in scikit-learn that handle nan/null

I was wondering if there are classifiers that handle nan/null values in scikit-learn. I thought random forest regressor handles this but I got an error when I call predict. X_train = np.array([[1, np.nan, 3],[np.nan, 5, 6]]) y_train = np.array([1,…

python pandas machine-learning scikit-learn nan

asked May 19 '15 at 05:02

anthonybell

5,790
7
42
60

votes

3 answers

Different result with roc_auc_score() and auc()

I have trouble understanding the difference (if there is one) between roc_auc_score() and auc() in scikit-learn. Im tying to predict a binary output with imbalanced classes (around 1.5% for Y=1). Classifier model_logit =…

python machine-learning scikit-learn

asked Jul 01 '15 at 10:48

gowithefloww

2,211
2
20
31

votes

12 answers

Impute categorical missing values in scikit-learn

I've got pandas data with some columns of text type. There are some NaN values along with these text columns. What I'm trying to do is to impute those NaN's by sklearn.preprocessing.Imputer (replacing NaN by the most frequent value). The problem is…

python pandas scikit-learn imputation

asked Aug 11 '14 at 09:26

night_bat

3,212
5
16
19

votes

7 answers

SKLearn warning "valid feature names" in version 1.0

I'm getting the following warning after upgrading to version 1.0 of scikit-learn: UserWarning: X does not have valid feature names, but IsolationForest was fitted with feature name I cannot find in the docs on what is a "valid feature name". How…

python-3.x pandas scikit-learn

asked Sep 25 '21 at 13:33

Jaume Figueras

votes

9 answers

The easiest way for getting feature names after running SelectKBest in Scikit Learn

I'm trying to conduct a supervised machine-learning experiment using the SelectKBest feature of scikit-learn, but I'm not sure how to create a new dataframe after finding the best features: Let's assume I would like to conduct the experiment…

python pandas scikit-learn feature-extraction feature-selection

asked Oct 03 '16 at 19:35

Aviade

2,057
4
27
49

votes

5 answers

Use scikit-learn to classify into multiple categories

I'm trying to use one of scikit-learn's supervised learning methods to classify pieces of text into one or more categories. The predict function of all the algorithms I tried just returns one match. For example I have a piece of text: "Theaters in…

python classification scikit-learn

asked May 10 '12 at 01:59

CodeMonkeyB

2,970
4
22
29

votes

6 answers

Scikit Learn SVC decision_function and predict

I'm trying to understand the relationship between decision_function and predict, which are instance methods of SVC (http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html). So far I've gathered that decision function returns pairwise…

python numpy svm scikit-learn

asked Nov 21 '13 at 05:29

Peter Tseng

1,294
1
12
15

votes

6 answers

Can sklearn random forest directly handle categorical features?

Say I have a categorical feature, color, which takes the values ['red', 'blue', 'green', 'orange'], and I want to use it to predict something in a random forest. If I one-hot encode it (i.e. I change it to four dummy variables), how do I tell…

python scikit-learn random-forest one-hot-encoding

asked Jul 12 '14 at 16:54

tkunk

1,378
1
13
19

votes

11 answers

Principal Component Analysis (PCA) in Python

I have a (26424 x 144) array and I want to perform PCA over it using Python. However, there is no particular place on the web that explains about how to achieve this task (There are some sites which just do PCA according to their own - there is no…

python scikit-learn pca

asked Nov 05 '12 at 00:10

khan

7,005
15
48
70

votes

3 answers

difference between StratifiedKFold and StratifiedShuffleSplit in sklearn

As from the title I am wondering what is the difference between StratifiedKFold with the parameter shuffle=True StratifiedKFold(n_splits=10, shuffle=True, random_state=0) and StratifiedShuffleSplit StratifiedShuffleSplit(n_splits=10,…

python machine-learning scikit-learn data-science cross-validation

asked Aug 30 '17 at 20:43

gabboshow

5,359
12
48
98

votes

5 answers

sklearn Logistic Regression "ValueError: Found array with dim 3. Estimator expected <= 2."

I attempt to solve this problem 6 in this notebook. The question is to train a simple model on this data using 50, 100, 1000 and 5000 training samples by using the LogisticRegression model from sklearn.linear_model. lr =…

python scikit-learn logistic-regression

asked Jan 24 '16 at 04:13

edwin

1,152
1
13
27

votes

6 answers

Mixing categorial and continuous data in Naive Bayes classifier using scikit-learn

I'm using scikit-learn in Python to develop a classification algorithm to predict the gender of certain customers. Amongst others, I want to use the Naive Bayes classifier but my problem is that I have a mix of categorical data (ex: "Registered…

python machine-learning data-mining classification scikit-learn

asked Jan 10 '13 at 09:08

user1499144

1,063
2
9
9

votes

3 answers

Feature/Variable importance after a PCA analysis

I have performed a PCA analysis over my original dataset and from the compressed dataset transformed by the PCA I have also selected the number of PC I want to keep (they explain almost the 94% of the variance). Now I am struggling with the…

python machine-learning scikit-learn pca feature-selection

asked Jun 11 '18 at 10:49

fbm

Prev 1 2 3

…

99 100 Next