Questions tagged [scikit-learn]

Scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining, with a focus on machine learning. It is accessible to everybody and reusable in various contexts. It is built on NumPy and SciPy. The project is open source and commercially usable (BSD license).

scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining. It is built on NumPy, SciPy, and matplotlib. The project is open source and commercially usable (BSD license).

Resources

Related Libraries

sklearn-pandas - bridge library between scikit-learn and pandas
scikit-image - scikit-learn-compatible API for image processing and computer vision for machine learning tasks
sklearn laboratory - scikit-learn wrapper that enables running larger scikit-learn experiments and feature sets
sklearn deap - scikit-learn wrapper that enables hyper parameter tuning using evolutionary algorithms instead of gridsearch in scikit-learn
hyperopt-sklearn - Hyper-parameter optimization for sklearn
scikit-plot - visualization library for quickly generating common plots in machine learning studies
sklearn-porter - library for turning trained scikit-learn models into compiled c, java, or javascript code
sklearn_theano - scikit-learn-compatible objects (estimators, transformers, and datasets) using theano internally
sparkit-learn - scikit-learn API that uses pyspark's distributed computing model
joblib - scikit-learn parallelization library

28024 questions

110

votes

8 answers

Passing categorical data to Sklearn Decision Tree

There are several posts about how to encode categorical data to Sklearn Decision trees, but from Sklearn documentation, we got these Some advantages of decision trees are: (...) Able to handle both numerical and categorical data. Other techniques…

python scikit-learn decision-tree

asked Jun 29 '16 at 19:47

0xhfff

1,215
2
9
6

108

votes

3 answers

A progress bar for scikit-learn?

Is there any way to have a progress bar to the fit method in scikit-learn ? Is it possible to include a custom one with something like Pyprind ?

scikit-learn

asked Dec 13 '15 at 14:07

user5674731

105

votes

13 answers

sklearn.LabelEncoder with never seen before values

If a sklearn.LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set. The only solution I could come up with for this is to map everything new in the test set (i.e. not belonging to any…

python scikit-learn

asked Jan 11 '14 at 01:54

cjauvin

3,433
4
29
38

104

votes

3 answers

RandomForestClassifier vs ExtraTreesClassifier in scikit learn

Can anyone explain the difference between the RandomForestClassifier and ExtraTreesClassifier in scikit learn. I've spent a good bit of time reading the paper: P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning,…

scikit-learn random-forest

asked Mar 14 '14 at 15:50

denson

2,366
2
24
25

102

votes

2 answers

Converting list to numpy array

I have managed to load images in a folder using the command line sklearn: load_sample_images() I would now like to convert it to a numpy.ndarray format with float32 datatype I was able to convert it to np.ndarray using : np.array(X), however…

python arrays numpy scikit-learn

asked Nov 10 '14 at 18:22

Priya Narayanan

1,197
2
10
16

100

votes

5 answers

Recovering features names of explained_variance_ratio_ in PCA with sklearn

I'm trying to recover from a PCA done with scikit-learn, which features are selected as relevant. A classic example with IRIS dataset. import pandas as pd import pylab as pl from sklearn import datasets from sklearn.decomposition import PCA # load…

python machine-learning scikit-learn pca

asked Apr 10 '14 at 09:43

sereizam

2,048
3
20
29

votes

6 answers

How to one-hot-encode from a pandas column containing a list?

I would like to break down a pandas column consisting of a list of elements into as many columns as there are unique elements i.e. one-hot-encode them (with value 1 representing a given element existing in a row and 0 in the case of absence). For…

python pandas numpy scikit-learn sklearn-pandas

asked Jul 25 '17 at 19:53

Melsauce

2,535
2
19
39

votes

11 answers

ImportError: No module named model_selection

I am trying to use train_test_split function and write: from sklearn.model_selection import train_test_split and this causes ImportError: No module named model_selection Why? And how to overcome?

python scikit-learn

asked Nov 20 '16 at 13:21

Dims

47,675
117
331
600

votes

17 answers

fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer

I am totally new to Machine Learning and I have been working with unsupervised learning technique. Image shows my sample Data(After all Cleaning) Screenshot : Sample Data I have this two Pipline built to Clean the Data: num_attribs =…

python scikit-learn data-science

asked Sep 11 '17 at 19:12

Viral Parmar

1,155
2
8
8

votes

19 answers

How to write a confusion matrix

I wrote a confusion matrix calculation code in Python: def conf_mat(prob_arr, input_arr): # confusion matrix conf_arr = [[0, 0], [0, 0]] for i in range(len(prob_arr)): if int(input_arr[i]) == 1: if float(prob_arr[i])…

python scikit-learn confusion-matrix

asked Jan 27 '10 at 16:27

Arja Varvio

votes

5 answers

LabelEncoder: TypeError: '>' not supported between instances of 'float' and 'str'

I'm facing this error for multiple variables even treating missing values. For example: le = preprocessing.LabelEncoder() categorical = list(df.select_dtypes(include=['object']).columns.values) for cat in categorical: print(cat) …

python pandas scikit-learn

asked Sep 25 '17 at 13:42

pceccon

9,379
26
82
158

votes

4 answers

ValueError: Unknown label type: 'unknown'

I try to run following code. import pandas as pd import numpy as np from sklearn.linear_model import LogisticRegression # data import and preparation trainData = pd.read_csv('train.csv') train = trainData.values testData =…

python pandas numpy scikit-learn logistic-regression

asked Jul 27 '17 at 09:23

Ivan Zhovannik

1,073
1
8
8

votes

3 answers

TypeError: cannot perform reduce with flexible type

I have been using the scikit-learn library. I'm trying to use the Gaussian Naive Bayes Module under the scikit-learn library but I'm running into the following error. TypeError: cannot perform reduce with flexible type Below is the code snippet.…

python python-2.7 scikit-learn

asked Feb 08 '15 at 10:52

Arihant Jain

votes

8 answers

RandomForestClassfier.fit(): ValueError: could not convert string to float

Given is a simple CSV file: A,B,C Hello,Hi,0 Hola,Bueno,1 Obviously the real dataset is far more complex than this, but this one reproduces the error. I'm attempting to build a random forest classifier for it, like so: cols =…

python scikit-learn random-forest

asked May 21 '15 at 21:51

nilkn

votes

5 answers

Scikit-learn train_test_split with indices

How do I get the original indices of the data when using train_test_split()? What I have is the following from sklearn.cross_validation import train_test_split import numpy as np data = np.reshape(np.randn(20),(10,2)) # 10 training examples labels =…

python scipy scikit-learn classification

asked Jul 20 '15 at 16:03

CentAu

10,660
15
59
85

Prev 1 2 3

…

99 100 Next