Questions tagged [scikit-learn]

Scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining, with a focus on machine learning. It is accessible to everybody and reusable in various contexts. It is built on NumPy and SciPy. The project is open source and commercially usable (BSD license).

scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining. It is built on NumPy, SciPy, and matplotlib. The project is open source and commercially usable (BSD license).

Resources

Related Libraries

  • sklearn-pandas - bridge library between scikit-learn and
  • scikit-image - scikit-learn-compatible API for image processing and computer vision for machine learning tasks
  • sklearn laboratory - scikit-learn wrapper that enables running larger scikit-learn experiments and feature sets
  • sklearn deap - scikit-learn wrapper that enables hyper parameter tuning using evolutionary algorithms instead of gridsearch in scikit-learn
  • hyperopt-sklearn - Hyper-parameter optimization for sklearn
  • scikit-plot - visualization library for quickly generating common plots in machine learning studies
  • sklearn-porter - library for turning trained scikit-learn models into compiled , , or code
  • sklearn_theano - scikit-learn-compatible objects (estimators, transformers, and datasets) using internally
  • sparkit-learn - scikit-learn API that uses 's distributed computing model
  • joblib - scikit-learn parallelization library
28024 questions
8
votes
4 answers

AttributeError: 'SimpleImputer' object has no attribute '_validate_data' in PyCaret

I am using PyCaret and get an error. AttributeError: 'SimpleImputer' object has no attribute '_validate_data' Trying to create a basic instance. # Create a basic PyCaret instance import pycaret from pycaret.regression import * mlb_pycaret =…
Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63
8
votes
1 answer

mask 0 values during normalization

I am doing normalization for datasets but the data contains a lot of 0 because of padding. I can mask them during model training but apparently, these zero will be affected when I applied normalization. from sklearn.preprocessing import…
Leo
  • 153
  • 2
  • 19
8
votes
1 answer

How to implement inverse transformation in a pipeline of a ColumnTransformer?

I would like to understand how to apply inverse transformation in a pipeline, and not using the StandardScaler function directly. The code that I am using is the following: import pandas as pd import numpy as np from sklearn.cluster import…
Jaime Vera
  • 91
  • 1
  • 3
8
votes
2 answers

lightgbm || ValueError: Series.dtypes must be int, float or bool

Dataframe has filled na values . Schema of dataset has no object dtype as specified in documentation. df.info() output: Int64Index: 429 entries, 351 to 559 Data columns (total 11 columns): # Column …
Gokul Y
  • 115
  • 2
  • 6
8
votes
2 answers

How to calculate pairwise Mutual Information for entire pandas dataset?

I have 50 variables in my dataframe. 46 are dependant variables and 4 are independandt variables (precipitation, temperature, dew, snow). I want to calculate the mutual information of my dependant variables agaisnt my independant. So in the end i…
Denise
  • 153
  • 3
  • 15
8
votes
1 answer

How to access ColumnTransformer elements in GridSearchCV

I wanted to find out the correct naming convention when referring to individual preprocessor included in ColumnTransformer (which is part of a pipeline) in param_grid for grid_search. Environment & sample data: import seaborn as sns from…
8
votes
2 answers

Residual plot for residual vs predicted value in Python

I have run a KNN model. Now i want to plot the residual vs predicted value plot. Every example from different websites shows that i have to first run a linear regression model. But i couldn't understand how to do this. Can anyone help? Thanks in…
ni7
  • 131
  • 1
  • 1
  • 6
8
votes
10 answers

ValueError: illegal value in 4-th argument of internal None when running sklearn LinearRegression().fit()

For some reason I cannot get this block of code to run properly anymore: import numpy as np from sklearn.linear_model import LinearRegression # Create linear data with some noise x = np.random.uniform(0, 100, 1000) y = 2. * x + 3. +…
evan.tuck
  • 81
  • 1
  • 3
8
votes
2 answers

how to convert a dataframe of counts to a probability density function

Suppose that I have the following observations of integers: df = pd.DataFrame({'observed_scores': [100, 100, 90, 85, 100, ...]}) I know that this can be used as an input to make a density plot: df['observed_scores'].plot.density() but suppose that…
irene
  • 2,085
  • 1
  • 22
  • 36
8
votes
2 answers

How can I plot validation curves using the results from GridSearchCV?

I am training a model with GridSearchCV in order to find the best parameters Code: grid_params = { 'n_estimators': [100, 200, 300, 400], 'criterion': ['gini', 'entropy'], 'max_features': ['auto', 'sqrt', 'log2'] } gs = GridSearchCV( …
Tlaloc-ES
  • 4,825
  • 7
  • 38
  • 84
8
votes
4 answers

ModuleNotFoundError: No Module named 'sklearn.utils._testing'

from sklearn.utils._testing import ignore_warnings ModuleNotFoundError: No Module named 'sklearn.utils._testing' How Could I solve this problem? My sklearn version is 0.21.3
Taijin Kim
  • 85
  • 1
  • 5
8
votes
1 answer

cohen kappa score in scikit learn

According to scikit learn documentation, the cohen kappa score can be calculated as this: from sklearn.metrics import cohen_kappa_score y_true = [1, 0, 1, 1, 1, 1] y_pred = [1, 0, 1, 1, 1, 1] print(cohen_kappa_score(y_true, y_pred) 1 where the 0…
practitioner
  • 412
  • 1
  • 5
  • 12
8
votes
5 answers

plot_confusion_matrix without estimator

I'm trying to use plot_confusion_matrix, from sklearn.metrics import confusion_matrix y_true = [1, 1, 0, 1] y_pred = [1, 1, 0, 0] confusion_matrix(y_true, y_pred) Output: array([[1, 0], [1, 2]]) Now, while using the followings; using…
Rakibul Hassan
  • 325
  • 3
  • 13
8
votes
1 answer

One Class SVM algorithm taking too long

The data bellow shows part of my dataset, that is used to detect anomalies describe_file data_numbers index 0 gkivdotqvj 7309.0 0 1 hpwgzodlky 2731.0 1 2 dgaecubawx 0.0 2 3 NaN …
E199504
  • 425
  • 4
  • 12
8
votes
2 answers

At least one label specified must be in y_true, target vector is numerical

I am implementing an SVM project with this data here is how I extract the features: import itertools import matplotlib.pyplot as plt import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn import…
user5871360
1 2 3
99
100