Questions tagged [scikit-learn]

Scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining, with a focus on machine learning. It is accessible to everybody and reusable in various contexts. It is built on NumPy and SciPy. The project is open source and commercially usable (BSD license).

scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining. It is built on NumPy, SciPy, and matplotlib. The project is open source and commercially usable (BSD license).

Resources

Related Libraries

sklearn-pandas - bridge library between scikit-learn and pandas
scikit-image - scikit-learn-compatible API for image processing and computer vision for machine learning tasks
sklearn laboratory - scikit-learn wrapper that enables running larger scikit-learn experiments and feature sets
sklearn deap - scikit-learn wrapper that enables hyper parameter tuning using evolutionary algorithms instead of gridsearch in scikit-learn
hyperopt-sklearn - Hyper-parameter optimization for sklearn
scikit-plot - visualization library for quickly generating common plots in machine learning studies
sklearn-porter - library for turning trained scikit-learn models into compiled c, java, or javascript code
sklearn_theano - scikit-learn-compatible objects (estimators, transformers, and datasets) using theano internally
sparkit-learn - scikit-learn API that uses pyspark's distributed computing model
joblib - scikit-learn parallelization library

28024 questions

187

votes

9 answers

what is the difference between 'transform' and 'fit_transform' in sklearn

In the sklearn-python toolbox, there are two functions transform and fit_transform about sklearn.decomposition.RandomizedPCA. The description of two functions are as follows But what is the difference between them ?

python scikit-learn

asked May 23 '14 at 20:42

tqjustc

3,624
6
27
42

176

votes

2 answers

How does the class_weight parameter in scikit-learn work?

I am having a lot of trouble understanding how the class_weight parameter in scikit-learn's Logistic Regression operates. The Situation I want to use logistic regression to do binary classification on a very unbalanced data set. The classes are…

python scikit-learn

asked Jun 22 '15 at 04:11

kilgoretrout

3,547
5
31
46

170

votes

10 answers

RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility

I have this error for trying to load a saved SVM model. I have tried uninstalling sklearn, NumPy and SciPy, reinstalling the latest versions all-together again (using pip). I am still getting this error. Why? In [1]: import sklearn; print…

python numpy scikit-learn

asked Nov 28 '16 at 13:17

Blue482

2,926
5
29
40

168

votes

28 answers

How to convert a Scikit-learn dataset to a Pandas dataset

How do I convert data from a Scikit-learn Bunch object to a Pandas DataFrame? from sklearn.datasets import load_iris import pandas as pd data = load_iris() print(type(data)) data1 = pd. # Is there a Pandas method to accomplish this?

python pandas scikit-learn dataset

asked Jun 27 '16 at 07:28

SANBI samples

2,058
2
14
20

161

votes

9 answers

Can anyone explain me StandardScaler?

I am unable to understand the page of the StandardScaler in the documentation of sklearn. Can anyone explain this to me in simple terms?

python machine-learning scikit-learn scaling standardized

asked Nov 23 '16 at 07:37

nitinvijay23

1,781
3
13
11

161

votes

6 answers

Parameter "stratify" from method "train_test_split" (scikit Learn)

I am trying to use train_test_split from package scikit Learn, but I am having trouble with parameter stratify. Hereafter is the code: from sklearn import cross_validation, datasets X = iris.data[:,:2] y =…

split scikit-learn training-data test-data

asked Jan 17 '16 at 19:05

Daneel Olivaw

2,077
4
15
23

156

votes

3 answers

How can I plot a confusion matrix?

I am using scikit-learn for classification of text documents(22000) to 100 classes. I use scikit-learn's confusion matrix method for computing the confusion matrix. model1 = LogisticRegression() model1 = model1.fit(matrix, labels) pred =…

python matplotlib matrix scikit-learn text-classification

asked Feb 23 '16 at 08:06

minks

2,859
4
21
29

149

votes

4 answers

What is exactly sklearn.pipeline.Pipeline?

I can't figure out how the sklearn.pipeline.Pipeline works exactly. There are a few explanation in the doc. For example what do they mean by: Pipeline of transforms with a final estimator. To make my question clearer, what are steps? How do they…

python machine-learning scikit-learn neuraxle

asked Oct 12 '15 at 22:42

farhawa

10,120
16
49
91

148

votes

2 answers

Logistic regression python solvers' definitions

I am using the logistic regression function from sklearn, and was wondering what each of the solver is actually doing behind the scenes to solve the optimization problem. Can someone briefly describe what "newton-cg", "sag", "lbfgs" and "liblinear"…

python python-3.x scikit-learn logistic-regression

asked Jul 28 '16 at 15:02

Clement

1,630
3
12
10

147

votes

11 answers

How to use sklearn fit_transform with pandas and return dataframe instead of numpy array?

I want to apply scaling (using StandardScaler() from sklearn.preprocessing) to a pandas dataframe. The following code returns a numpy array, so I lose all the column names and indeces. This is not what I want. features = df[["col1", "col2", "col3",…

python numpy pandas scikit-learn

asked Mar 01 '16 at 12:51

Louic

2,403
3
19
34

143

votes

5 answers

What are the pros and cons between get_dummies (Pandas) and OneHotEncoder (Scikit-learn)?

I'm learning different methods to convert categorical variables to numeric for machine-learning classifiers. I came across the pd.get_dummies method and sklearn.preprocessing.OneHotEncoder() and I wanted to see how they differed in terms of…

python pandas machine-learning scikit-learn dummy-variable

asked Apr 14 '16 at 18:28

O.rka

29,847
68
194
309

143

votes

7 answers

How are feature_importances in RandomForestClassifier determined?

I have a classification task with a time-series as the data input, where each attribute (n=23) represents a specific point in time. Besides the absolute classification result I would like to find out, which attributes/dates contribute to the result…

scikit-learn random-forest feature-selection

asked Apr 04 '13 at 11:53

user2244670

1,431
2
10
3

141

votes

4 answers

How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn?

I'm working in a sentiment analysis problem the data looks like this: label instances 5 1190 4 838 3 239 1 204 2 127 So my data is unbalanced since 1190 instances are labeled with 5. For the classification Im…

python machine-learning nlp artificial-intelligence scikit-learn

asked Jul 15 '15 at 04:17

new_with_python

1,567
2
11
8

140

votes

4 answers

Sklearn, gridsearch: how to print out progress during the execution?

I am using GridSearch from sklearn to optimize parameters of the classifier. There is a lot of data, so the whole process of optimization takes a while: more than a day. I would like to watch the performance of the already-tried combinations of…

python logging scikit-learn

asked Jun 09 '14 at 13:08

doubts

1,763
2
12
19

138

votes

4 answers

What are the different use cases of joblib versus pickle?

Background: I'm just getting started with scikit-learn, and read at the bottom of the page about joblib, versus pickle. it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on big…

python pickle scikit-learn

asked Sep 27 '12 at 06:39

msunbot

1,871
4
16
16

Prev 1

…

99 100 Next