Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
2
votes
1 answer
Train and test data setup for sklearn
I'm creating a classification model to predict the outcome of sports event(win/loss) and am running into a data setup conundrum.
Currently the data is setup as follows:
example_data = [team_a_feat_1, team_a_feat_2...team_b_feat_1, team_b_feat_2...…

Sentient AI Turing
- 49
- 4
2
votes
1 answer
Dummy prediction failed with run state StatusType.CRASHED in auto-sklearn
I am trying to train a simple classification model on the iris dataset using auto-sklearn.
When I try to fit my model, I keep getting the following error,
ValueError: (' Dummy prediction failed with run state StatusType.CRASHED and additional…

Minura Punchihewa
- 1,498
- 1
- 12
- 35
2
votes
0 answers
Why does my Jupyter Notebook work on one computer and fail on another?
I have recently downloaded Jupyter Notebook into my mac but the file that I was able to open in my school windows gives me an error.
These are my libraries:
# Basic Libraries
import numpy as np
import pandas as pd
import seaborn as sb
import…

akimo
- 21
- 2
2
votes
1 answer
How do I shift the contents in the first column into column names in Pandas
I god a dataFrame called df_tags, and I'd like to shift the whole dataframe by using all values of Tag column as column header name with their corresponding values as the first row of values.
I have tried using df_tags.pop suggested here because my…

deLaJU
- 45
- 1
- 8
2
votes
1 answer
During calculation of "distance average" in knn imputation method for replacing NaN value in particular column
I encounter this problem when I implement the Knn imputation method for handling missing data from scratch. I create a dummy dataset and find the nearest neighbors for rows that contain missing values here is my dataset
A B C D …

CS Vyas
- 49
- 6
2
votes
2 answers
Reordering rows in a dataframe to match order of rows in another dataframe
I have 2 dataframes df1 and df2 which have the same number of rows but in a different ordering. The dataframes have an ID column which contains a unique identifier for each row, both dataframes have this ID column and this is the column for which I…

Zein
- 33
- 7
2
votes
1 answer
Why PowerTransformer raises FloatingPointError by given non zero data
from sklearn.preprocessing import PowerTransformer
transformer = PowerTransformer(method='yeo-johnson', standardize=True)
arr = [330117.5,
651193.35,
364335.63,
2136036.01,
1184539.05,
1186871.87,
2310647.36,
860183.78,
237451.79,
…

Tõnis
- 21
- 2
2
votes
2 answers
LabelEncoder().fit_transform gives me negative values?
Hei,
I have different city names in the column "City" in my dataset. I would love to encode it using LabelEncoder(). However, I got quite frustrating results with negative values
df['city_enc'] =…

Nguyen Ngoc Lan
- 23
- 2
2
votes
2 answers
Flatten all cells from float64 arrays to int in a Pandas dataframe
I have a Pandas DataFrame with 6 rows and 11 columns which contains a float64 array with a single value in each cell. The cells in the dataframe look like this:
And this is what I get after transforming the dataframe to a dictionary:
{'AO': {"W":…

Beg
- 405
- 1
- 5
- 18
2
votes
0 answers
Python: Create 5-6 groups from a dataset so that the groups are balanced across 3 different variables (decile, population size & region)
I was given already filtered datasets. The request is to create 5-6 equally sized groups that are balanced/stratified across 3 different variables. I have two datasets to do this for, one with about 540 rows and the other with about 880 rows. The…

GrizzledLotus
- 21
- 1
2
votes
1 answer
Differencies between OneHotEncoding (sklearn) and get_dummies (pandas)
I am wondering what is the difference between pandas' get_dummies() encoding of categorical features as compared to the sklearn's OneHotEncoder().
I've seen answers that mention that get_dummies() cannot produce encoding for categories not seen in…

Chris X
- 21
- 1
2
votes
2 answers
How to solve sklearn error: "Found input variables with inconsistent numbers of samples"?
I have a challenge using the sklearn 70-30 division. I receive an error on line:
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
The error is:
Found input variables with inconsistent numbers of…

Paip
- 21
- 1
- 3
2
votes
1 answer
How can I convert the StandardScaler() transformation back to dataframe?
I'm working with a model, and after splitting into train and test, I want to apply StandardScaler(). However, this transformation converts my data into an array and I want to keep the format I had before. How can I do this?
Basically, I have:
from…

dmmmmd
- 79
- 2
- 6
2
votes
1 answer
Fill Pandas Column NaNs with numpy array values
Sorry if this question seems too for newbies but I've been looking for an answer I didn't find it.
So, I have a dataset with lots of NaN values and I've been working on some regressions to predict those nulls, and since the prediction is given as a…

fega_zero
- 125
- 9
2
votes
1 answer
Stratified sampling into 3 sets considering unbalance
I have looked into Stratified sample in pandas, stratified sampling on ranges, among others and they don't assess my issue specifically, as I'm looking to split the data into 3 sets randomly.
I have an unbalanced dataframe of 10k rows, 10% is…

Chris
- 2,019
- 5
- 22
- 67