Questions tagged [scikit-learn]

Scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining, with a focus on machine learning. It is accessible to everybody and reusable in various contexts. It is built on NumPy and SciPy. The project is open source and commercially usable (BSD license).

scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining. It is built on NumPy, SciPy, and matplotlib. The project is open source and commercially usable (BSD license).

Resources

Related Libraries

sklearn-pandas - bridge library between scikit-learn and pandas
scikit-image - scikit-learn-compatible API for image processing and computer vision for machine learning tasks
sklearn laboratory - scikit-learn wrapper that enables running larger scikit-learn experiments and feature sets
sklearn deap - scikit-learn wrapper that enables hyper parameter tuning using evolutionary algorithms instead of gridsearch in scikit-learn
hyperopt-sklearn - Hyper-parameter optimization for sklearn
scikit-plot - visualization library for quickly generating common plots in machine learning studies
sklearn-porter - library for turning trained scikit-learn models into compiled c, java, or javascript code
sklearn_theano - scikit-learn-compatible objects (estimators, transformers, and datasets) using theano internally
sparkit-learn - scikit-learn API that uses pyspark's distributed computing model
joblib - scikit-learn parallelization library

28024 questions

votes

4 answers

AttributeError: 'SimpleImputer' object has no attribute '_validate_data' in PyCaret

I am using PyCaret and get an error. AttributeError: 'SimpleImputer' object has no attribute '_validate_data' Trying to create a basic instance. # Create a basic PyCaret instance import pycaret from pycaret.regression import * mlb_pycaret =…

python scikit-learn pycaret

asked Nov 25 '20 at 20:58

Anakin Skywalker

2,400
5
35
63

votes

1 answer

mask 0 values during normalization

I am doing normalization for datasets but the data contains a lot of 0 because of padding. I can mask them during model training but apparently, these zero will be affected when I applied normalization. from sklearn.preprocessing import…

python machine-learning scikit-learn

asked Oct 28 '20 at 06:25

Leo

votes

1 answer

How to implement inverse transformation in a pipeline of a ColumnTransformer?

I would like to understand how to apply inverse transformation in a pipeline, and not using the StandardScaler function directly. The code that I am using is the following: import pandas as pd import numpy as np from sklearn.cluster import…

python scikit-learn pipeline

asked Oct 26 '20 at 19:09

Jaime Vera

votes

2 answers

lightgbm || ValueError: Series.dtypes must be int, float or bool

Dataframe has filled na values . Schema of dataset has no object dtype as specified in documentation. df.info() output: Int64Index: 429 entries, 351 to 559 Data columns (total 11 columns): # Column …

python-3.x scikit-learn jupyter-notebook lightgbm

asked Oct 25 '20 at 14:22

Gokul Y

votes

2 answers

How to calculate pairwise Mutual Information for entire pandas dataset?

I have 50 variables in my dataframe. 46 are dependant variables and 4 are independandt variables (precipitation, temperature, dew, snow). I want to calculate the mutual information of my dependant variables agaisnt my independant. So in the end i…

python pandas dataframe scikit-learn mutual-information

asked Sep 19 '20 at 13:10

Denise

votes

1 answer

How to access ColumnTransformer elements in GridSearchCV

I wanted to find out the correct naming convention when referring to individual preprocessor included in ColumnTransformer (which is part of a pipeline) in param_grid for grid_search. Environment & sample data: import seaborn as sns from…

python scikit-learn grid-search gridsearchcv

asked Aug 18 '20 at 11:37

Zolzaya Luvsandorj

votes

2 answers

Residual plot for residual vs predicted value in Python

I have run a KNN model. Now i want to plot the residual vs predicted value plot. Every example from different websites shows that i have to first run a linear regression model. But i couldn't understand how to do this. Can anyone help? Thanks in…

python machine-learning scikit-learn data-science

asked Jul 01 '20 at 16:34

ni7

votes

10 answers

ValueError: illegal value in 4-th argument of internal None when running sklearn LinearRegression().fit()

For some reason I cannot get this block of code to run properly anymore: import numpy as np from sklearn.linear_model import LinearRegression # Create linear data with some noise x = np.random.uniform(0, 100, 1000) y = 2. * x + 3. +…

python python-3.x scikit-learn scipy

asked Jun 24 '20 at 18:35

evan.tuck

votes

2 answers

how to convert a dataframe of counts to a probability density function

Suppose that I have the following observations of integers: df = pd.DataFrame({'observed_scores': [100, 100, 90, 85, 100, ...]}) I know that this can be used as an input to make a density plot: df['observed_scores'].plot.density() but suppose that…

python pandas scikit-learn

asked Jun 22 '20 at 15:48

irene

2,085
1
22
36

votes

2 answers

How can I plot validation curves using the results from GridSearchCV?

I am training a model with GridSearchCV in order to find the best parameters Code: grid_params = { 'n_estimators': [100, 200, 300, 400], 'criterion': ['gini', 'entropy'], 'max_features': ['auto', 'sqrt', 'log2'] } gs = GridSearchCV( …

python scikit-learn grid-search

asked Jun 13 '20 at 17:52

Tlaloc-ES

4,825
7
38
84

votes

4 answers

ModuleNotFoundError: No Module named 'sklearn.utils._testing'

from sklearn.utils._testing import ignore_warnings ModuleNotFoundError: No Module named 'sklearn.utils._testing' How Could I solve this problem? My sklearn version is 0.21.3

python scikit-learn

asked Jun 12 '20 at 01:33

Taijin Kim

votes

1 answer

cohen kappa score in scikit learn

According to scikit learn documentation, the cohen kappa score can be calculated as this: from sklearn.metrics import cohen_kappa_score y_true = [1, 0, 1, 1, 1, 1] y_pred = [1, 0, 1, 1, 1, 1] print(cohen_kappa_score(y_true, y_pred) 1 where the 0…

python scikit-learn

asked Mar 23 '20 at 19:47

practitioner

votes

5 answers

plot_confusion_matrix without estimator

I'm trying to use plot_confusion_matrix, from sklearn.metrics import confusion_matrix y_true = [1, 1, 0, 1] y_pred = [1, 1, 0, 0] confusion_matrix(y_true, y_pred) Output: array([[1, 0], [1, 2]]) Now, while using the followings; using…

python scikit-learn confusion-matrix

asked Mar 20 '20 at 15:08

Rakibul Hassan

votes

1 answer

One Class SVM algorithm taking too long

The data bellow shows part of my dataset, that is used to detect anomalies describe_file data_numbers index 0 gkivdotqvj 7309.0 0 1 hpwgzodlky 2731.0 1 2 dgaecubawx 0.0 2 3 NaN …

machine-learning scikit-learn svm anomaly-detection

asked Mar 17 '20 at 14:19

E199504

votes

2 answers

At least one label specified must be in y_true, target vector is numerical

I am implementing an SVM project with this data here is how I extract the features: import itertools import matplotlib.pyplot as plt import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn import…

python-3.x machine-learning scikit-learn svm

asked Mar 01 '20 at 16:21

user5871360

Prev 1 2 3

…

100