Questions tagged [scikit-learn]

Scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining, with a focus on machine learning. It is accessible to everybody and reusable in various contexts. It is built on NumPy and SciPy. The project is open source and commercially usable (BSD license).

scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining. It is built on NumPy, SciPy, and matplotlib. The project is open source and commercially usable (BSD license).

Resources

Related Libraries

  • sklearn-pandas - bridge library between scikit-learn and
  • scikit-image - scikit-learn-compatible API for image processing and computer vision for machine learning tasks
  • sklearn laboratory - scikit-learn wrapper that enables running larger scikit-learn experiments and feature sets
  • sklearn deap - scikit-learn wrapper that enables hyper parameter tuning using evolutionary algorithms instead of gridsearch in scikit-learn
  • hyperopt-sklearn - Hyper-parameter optimization for sklearn
  • scikit-plot - visualization library for quickly generating common plots in machine learning studies
  • sklearn-porter - library for turning trained scikit-learn models into compiled , , or code
  • sklearn_theano - scikit-learn-compatible objects (estimators, transformers, and datasets) using internally
  • sparkit-learn - scikit-learn API that uses 's distributed computing model
  • joblib - scikit-learn parallelization library
28024 questions
110
votes
8 answers

Passing categorical data to Sklearn Decision Tree

There are several posts about how to encode categorical data to Sklearn Decision trees, but from Sklearn documentation, we got these Some advantages of decision trees are: (...) Able to handle both numerical and categorical data. Other techniques…
0xhfff
  • 1,215
  • 2
  • 9
  • 6
108
votes
3 answers

A progress bar for scikit-learn?

Is there any way to have a progress bar to the fit method in scikit-learn ? Is it possible to include a custom one with something like Pyprind ?
user5674731
105
votes
13 answers

sklearn.LabelEncoder with never seen before values

If a sklearn.LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set. The only solution I could come up with for this is to map everything new in the test set (i.e. not belonging to any…
cjauvin
  • 3,433
  • 4
  • 29
  • 38
104
votes
3 answers

RandomForestClassifier vs ExtraTreesClassifier in scikit learn

Can anyone explain the difference between the RandomForestClassifier and ExtraTreesClassifier in scikit learn. I've spent a good bit of time reading the paper: P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning,…
denson
  • 2,366
  • 2
  • 24
  • 25
102
votes
2 answers

Converting list to numpy array

I have managed to load images in a folder using the command line sklearn: load_sample_images() I would now like to convert it to a numpy.ndarray format with float32 datatype I was able to convert it to np.ndarray using : np.array(X), however…
Priya Narayanan
  • 1,197
  • 2
  • 10
  • 16
100
votes
5 answers

Recovering features names of explained_variance_ratio_ in PCA with sklearn

I'm trying to recover from a PCA done with scikit-learn, which features are selected as relevant. A classic example with IRIS dataset. import pandas as pd import pylab as pl from sklearn import datasets from sklearn.decomposition import PCA # load…
sereizam
  • 2,048
  • 3
  • 20
  • 29
98
votes
6 answers

How to one-hot-encode from a pandas column containing a list?

I would like to break down a pandas column consisting of a list of elements into as many columns as there are unique elements i.e. one-hot-encode them (with value 1 representing a given element existing in a row and 0 in the case of absence). For…
Melsauce
  • 2,535
  • 2
  • 19
  • 39
97
votes
11 answers

ImportError: No module named model_selection

I am trying to use train_test_split function and write: from sklearn.model_selection import train_test_split and this causes ImportError: No module named model_selection Why? And how to overcome?
Dims
  • 47,675
  • 117
  • 331
  • 600
96
votes
17 answers

fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer

I am totally new to Machine Learning and I have been working with unsupervised learning technique. Image shows my sample Data(After all Cleaning) Screenshot : Sample Data I have this two Pipline built to Clean the Data: num_attribs =…
Viral Parmar
  • 1,155
  • 2
  • 8
  • 8
96
votes
19 answers

How to write a confusion matrix

I wrote a confusion matrix calculation code in Python: def conf_mat(prob_arr, input_arr): # confusion matrix conf_arr = [[0, 0], [0, 0]] for i in range(len(prob_arr)): if int(input_arr[i]) == 1: if float(prob_arr[i])…
Arja Varvio
  • 969
  • 1
  • 7
  • 3
93
votes
5 answers

LabelEncoder: TypeError: '>' not supported between instances of 'float' and 'str'

I'm facing this error for multiple variables even treating missing values. For example: le = preprocessing.LabelEncoder() categorical = list(df.select_dtypes(include=['object']).columns.values) for cat in categorical: print(cat) …
pceccon
  • 9,379
  • 26
  • 82
  • 158
90
votes
4 answers

ValueError: Unknown label type: 'unknown'

I try to run following code. import pandas as pd import numpy as np from sklearn.linear_model import LogisticRegression # data import and preparation trainData = pd.read_csv('train.csv') train = trainData.values testData =…
Ivan Zhovannik
  • 1,073
  • 1
  • 8
  • 8
90
votes
3 answers

TypeError: cannot perform reduce with flexible type

I have been using the scikit-learn library. I'm trying to use the Gaussian Naive Bayes Module under the scikit-learn library but I'm running into the following error. TypeError: cannot perform reduce with flexible type Below is the code snippet.…
Arihant Jain
  • 909
  • 1
  • 6
  • 5
89
votes
8 answers

RandomForestClassfier.fit(): ValueError: could not convert string to float

Given is a simple CSV file: A,B,C Hello,Hi,0 Hola,Bueno,1 Obviously the real dataset is far more complex than this, but this one reproduces the error. I'm attempting to build a random forest classifier for it, like so: cols =…
nilkn
  • 935
  • 1
  • 7
  • 8
88
votes
5 answers

Scikit-learn train_test_split with indices

How do I get the original indices of the data when using train_test_split()? What I have is the following from sklearn.cross_validation import train_test_split import numpy as np data = np.reshape(np.randn(20),(10,2)) # 10 training examples labels =…
CentAu
  • 10,660
  • 15
  • 59
  • 85