Questions tagged [scikit-learn]

Scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining, with a focus on machine learning. It is accessible to everybody and reusable in various contexts. It is built on NumPy and SciPy. The project is open source and commercially usable (BSD license).

scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining. It is built on NumPy, SciPy, and matplotlib. The project is open source and commercially usable (BSD license).

Resources

Related Libraries

sklearn-pandas - bridge library between scikit-learn and pandas
scikit-image - scikit-learn-compatible API for image processing and computer vision for machine learning tasks
sklearn laboratory - scikit-learn wrapper that enables running larger scikit-learn experiments and feature sets
sklearn deap - scikit-learn wrapper that enables hyper parameter tuning using evolutionary algorithms instead of gridsearch in scikit-learn
hyperopt-sklearn - Hyper-parameter optimization for sklearn
scikit-plot - visualization library for quickly generating common plots in machine learning studies
sklearn-porter - library for turning trained scikit-learn models into compiled c, java, or javascript code
sklearn_theano - scikit-learn-compatible objects (estimators, transformers, and datasets) using theano internally
sparkit-learn - scikit-learn API that uses pyspark's distributed computing model
joblib - scikit-learn parallelization library

28024 questions

134

votes

10 answers

how to check which version of nltk, scikit learn installed?

In shell script I am checking whether this packages are installed or not, if not installed then install it. So withing shell script: import nltk echo nltk.__version__ but it stops shell script at import line in linux terminal tried to see in this…

python linux shell scikit-learn nltk

asked Feb 13 '15 at 13:46

nlper

2,297
7
27
37

133

votes

6 answers

Run an OLS regression with Pandas Data Frame

I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example: import pandas as pd df = pd.DataFrame({"A": [10,20,30,40,50], "B": [20, 30, 10, 40,…

python pandas scikit-learn regression statsmodels

asked Nov 15 '13 at 00:47

Michael

13,244
23
67
115

132

votes

3 answers

Why does one hot encoding improve machine learning performance?

I have noticed that when One Hot encoding is used on a particular data set (a matrix) and used as training data for learning algorithms, it gives significantly better results with respect to prediction accuracy, compared to using the original matrix…

machine-learning data-mining scikit-learn data-analysis

asked Jul 04 '13 at 12:04

maheshakya

2,198
7
28
43

132

votes

13 answers

ImportError in importing from sklearn: cannot import name check_build

I am getting the following error while trying to import from sklearn: >>> from sklearn import svm Traceback (most recent call last): File "", line 1, in from sklearn import svm File…

python numpy scipy scikit-learn

asked Mar 07 '13 at 15:12

ayush singhal

1,879
2
18
33

130

votes

8 answers

UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples

I'm getting this weird error: classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples. 'precision', 'predicted', average, warn_for)` but then it also prints the f-score the…

python scikit-learn

asked Apr 01 '17 at 22:05

Sticky

3,671
5
34
58

130

votes

9 answers

Stratified Train/Test-split in scikit-learn

I need to split my data into a training set (75%) and test set (25%). I currently do that with the code below: X, Xt, userInfo, userInfo_train = sklearn.cross_validation.train_test_split(X, userInfo) However, I'd like to stratify my training…

python scikit-learn

asked Apr 03 '15 at 19:11

pir

5,513
12
63
101

122

votes

3 answers

LogisticRegression: Unknown label type: 'continuous' using sklearn in python

I have the following code to test some of most popular ML algorithms of sklearn python library: import numpy as np from sklearn import metrics, svm from sklearn.linear_model import LinearRegression from…

python numpy scikit-learn

asked Jan 29 '17 at 19:43

mllamazares

7,876
17
61
89

122

votes

6 answers

Understanding min_df and max_df in scikit CountVectorizer

I have five text files that I input to a CountVectorizer. When specifying min_df and max_df to the CountVectorizer instance what does the min/max document frequency exactly mean? Is it the frequency of a word in its particular text file or is it the…

python machine-learning scikit-learn nlp

asked Dec 29 '14 at 23:57

moeabdol

4,779
6
44
43

122

votes

10 answers

sklearn plot confusion matrix with labels

I want to plot a confusion matrix to visualize the classifer's performance, but it shows only the numbers of the labels, not the labels themselves: from sklearn.metrics import confusion_matrix import pylab as pl y_test=['business', 'business',…

python matplotlib scikit-learn

asked Oct 07 '13 at 20:08

hmghaly

1,411
3
29
47

119

votes

20 answers

Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative

My problem: I have a dataset which is a large JSON file. I read it and store it in the trainList variable. Next, I pre-process it - in order to be able to work with it. Once I have done that I start the classification: I use the kfold cross…

python machine-learning scikit-learn classification supervised-learning

asked Jul 09 '15 at 17:19

Euskalduna

1,517
2
13
12

116

votes

3 answers

Will scikit-learn utilize GPU?

Reading implementation of scikit-learn in TensorFlow: http://learningtensorflow.com/lesson6/ and scikit-learn: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html I'm struggling to decide which implementation to…

python tensorflow scikit-learn k-means neuraxle

asked Jan 10 '17 at 11:37

blue-sky

51,962
152
427
752

115

votes

4 answers

ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT

I have a dataset consisting of both numeric and categorical data and I want to predict adverse outcomes for patients based on their medical characteristics. I defined a prediction pipeline for my dataset like so: X =…

python machine-learning scikit-learn logistic-regression

asked Jun 30 '20 at 13:08

sums22

1,793
3
13
25

113

votes

6 answers

scikit-learn .predict() default threshold

I'm working on a classification problem with unbalanced classes (5% 1's). I want to predict the class, not the probability. In a binary classification problem, is scikit's classifier.predict() using 0.5 by default? If it doesn't, what's the default…

python machine-learning scikit-learn classification imbalanced-data

asked Nov 14 '13 at 18:00

ADJ

4,892
10
50
83

112

votes

10 answers

sklearn: Found arrays with inconsistent numbers of samples when calling LinearRegression.fit()

Just trying to do a simple linear regression but I'm baffled by this error for: regr = LinearRegression() regr.fit(df2.iloc[1:1000, 5].values, df2.iloc[1:1000, 2].values) which produces: ValueError: Found arrays with inconsistent numbers of…

scikit-learn

asked Jun 12 '15 at 22:26

sunny

3,853
5
32
62

111

votes

8 answers

Accuracy Score ValueError: Can't Handle mix of binary and continuous target

I'm using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it's perfect. I have a problem to evaluate the predicted results using the accuracy_score metric. This is my true Data : array([1, 1, 0, 0, 0, 0, 1, 1, 0,…

python machine-learning scikit-learn linear-regression prediction

asked Jun 24 '16 at 13:57

Arij SEDIRI

2,088
7
25
43

Prev 1 2

…

99 100 Next