Highest Voted 'sklearn-pandas' Questions

2

votes

1 answer

Train and test data setup for sklearn

I'm creating a classification model to predict the outcome of sports event(win/loss) and am running into a data setup conundrum. Currently the data is setup as follows: example_data = [team_a_feat_1, team_a_feat_2...team_b_feat_1, team_b_feat_2...…

asked Mar 20 '23 at 23:36

Sentient AI Turing

49
4

2

votes

1 answer

Dummy prediction failed with run state StatusType.CRASHED in auto-sklearn

I am trying to train a simple classification model on the iris dataset using auto-sklearn. When I try to fit my model, I keep getting the following error, ValueError: (' Dummy prediction failed with run state StatusType.CRASHED and additional…

python scikit-learn sklearn-pandas auto-sklearn

asked Oct 23 '22 at 09:02

Minura Punchihewa

1,498
1
12
35

2

votes

0 answers

Why does my Jupyter Notebook work on one computer and fail on another?

I have recently downloaded Jupyter Notebook into my mac but the file that I was able to open in my school windows gives me an error. These are my libraries: # Basic Libraries import numpy as np import pandas as pd import seaborn as sb import…

python pandas matplotlib jupyter-notebook sklearn-pandas

asked Feb 21 '22 at 13:04

akimo

21
2

2

votes

1 answer

How do I shift the contents in the first column into column names in Pandas

I god a dataFrame called df_tags, and I'd like to shift the whole dataframe by using all values of Tag column as column header name with their corresponding values as the first row of values. I have tried using df_tags.pop suggested here because my…

python pandas dataframe sklearn-pandas

asked Oct 14 '21 at 12:34

deLaJU

45
1
8

2

votes

1 answer

During calculation of "distance average" in knn imputation method for replacing NaN value in particular column

I encounter this problem when I implement the Knn imputation method for handling missing data from scratch. I create a dummy dataset and find the nearest neighbors for rows that contain missing values here is my dataset A B C D …

dataframe machine-learning sklearn-pandas feature-engineering data-preprocessing

asked Aug 24 '21 at 04:20

CS Vyas

49
6

2

votes

2 answers

Reordering rows in a dataframe to match order of rows in another dataframe

I have 2 dataframes df1 and df2 which have the same number of rows but in a different ordering. The dataframes have an ID column which contains a unique identifier for each row, both dataframes have this ID column and this is the column for which I…

python pandas scikit-learn sklearn-pandas

asked Aug 06 '21 at 17:08

Zein

33
7

2

votes

1 answer

Why PowerTransformer raises FloatingPointError by given non zero data

from sklearn.preprocessing import PowerTransformer transformer = PowerTransformer(method='yeo-johnson', standardize=True) arr = [330117.5, 651193.35, 364335.63, 2136036.01, 1184539.05, 1186871.87, 2310647.36, 860183.78, 237451.79, …

python scikit-learn sklearn-pandas

asked Jul 08 '21 at 18:03

Tõnis

21
2

2

votes

2 answers

LabelEncoder().fit_transform gives me negative values?

Hei, I have different city names in the column "City" in my dataset. I would love to encode it using LabelEncoder(). However, I got quite frustrating results with negative values df['city_enc'] =…

python scikit-learn sklearn-pandas one-hot-encoding label-encoding

asked Jul 01 '21 at 11:47

Nguyen Ngoc Lan

23
2

2

votes

2 answers

Flatten all cells from float64 arrays to int in a Pandas dataframe

I have a Pandas DataFrame with 6 rows and 11 columns which contains a float64 array with a single value in each cell. The cells in the dataframe look like this: And this is what I get after transforming the dataframe to a dictionary: {'AO': {"W":…

pandas dataframe numpy scikit-learn sklearn-pandas

asked May 17 '21 at 21:23

Beg

405
1
5
18

2

votes

0 answers

Python: Create 5-6 groups from a dataset so that the groups are balanced across 3 different variables (decile, population size & region)

I was given already filtered datasets. The request is to create 5-6 equally sized groups that are balanced/stratified across 3 different variables. I have two datasets to do this for, one with about 540 rows and the other with about 880 rows. The…

python pandas scikit-learn subset sklearn-pandas

asked May 17 '21 at 19:54

GrizzledLotus

21
1

2

votes

1 answer

Differencies between OneHotEncoding (sklearn) and get_dummies (pandas)

I am wondering what is the difference between pandas' get_dummies() encoding of categorical features as compared to the sklearn's OneHotEncoder(). I've seen answers that mention that get_dummies() cannot produce encoding for categories not seen in…

python training-data sklearn-pandas one-hot-encoding

asked Jan 06 '21 at 16:11

Chris X

21
1

2

votes

2 answers

How to solve sklearn error: "Found input variables with inconsistent numbers of samples"?

I have a challenge using the sklearn 70-30 division. I receive an error on line: X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y) The error is: Found input variables with inconsistent numbers of…

python data-analysis sklearn-pandas train-test-split

asked Oct 19 '20 at 22:34

Paip

21
1
3

2

votes

1 answer

How can I convert the StandardScaler() transformation back to dataframe?

I'm working with a model, and after splitting into train and test, I want to apply StandardScaler(). However, this transformation converts my data into an array and I want to keep the format I had before. How can I do this? Basically, I have: from…

python pandas dataframe scikit-learn sklearn-pandas

asked Oct 01 '20 at 18:40

dmmmmd

79
2
6

2

votes

1 answer

Fill Pandas Column NaNs with numpy array values

Sorry if this question seems too for newbies but I've been looking for an answer I didn't find it. So, I have a dataset with lots of NaN values and I've been working on some regressions to predict those nulls, and since the prediction is given as a…

pandas dataframe regression sklearn-pandas fillna

asked Oct 01 '20 at 09:06

fega_zero

125
9

2

votes

1 answer

Stratified sampling into 3 sets considering unbalance

I have looked into Stratified sample in pandas, stratified sampling on ranges, among others and they don't assess my issue specifically, as I'm looking to split the data into 3 sets randomly. I have an unbalanced dataframe of 10k rows, 10% is…

pandas numpy scikit-learn sklearn-pandas

asked Sep 30 '20 at 16:45

Chris

2,019
5
22
67

Questions tagged [sklearn-pandas]

Resources

Train and test data setup for sklearn

Dummy prediction failed with run state StatusType.CRASHED in auto-sklearn

Why does my Jupyter Notebook work on one computer and fail on another?

How do I shift the contents in the first column into column names in Pandas

During calculation of "distance average" in knn imputation method for replacing NaN value in particular column

Reordering rows in a dataframe to match order of rows in another dataframe

Why PowerTransformer raises FloatingPointError by given non zero data

LabelEncoder().fit_transform gives me negative values?

Flatten all cells from float64 arrays to int in a Pandas dataframe

Python: Create 5-6 groups from a dataset so that the groups are balanced across 3 different variables (decile, population size & region)

Differencies between OneHotEncoding (sklearn) and get_dummies (pandas)

How to solve sklearn error: "Found input variables with inconsistent numbers of samples"?

How can I convert the StandardScaler() transformation back to dataframe?

Fill Pandas Column NaNs with numpy array values

Stratified sampling into 3 sets considering unbalance