Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
2
votes
1 answer

Train and test data setup for sklearn

I'm creating a classification model to predict the outcome of sports event(win/loss) and am running into a data setup conundrum. Currently the data is setup as follows: example_data = [team_a_feat_1, team_a_feat_2...team_b_feat_1, team_b_feat_2...…
2
votes
1 answer

Dummy prediction failed with run state StatusType.CRASHED in auto-sklearn

I am trying to train a simple classification model on the iris dataset using auto-sklearn. When I try to fit my model, I keep getting the following error, ValueError: (' Dummy prediction failed with run state StatusType.CRASHED and additional…
Minura Punchihewa
  • 1,498
  • 1
  • 12
  • 35
2
votes
0 answers

Why does my Jupyter Notebook work on one computer and fail on another?

I have recently downloaded Jupyter Notebook into my mac but the file that I was able to open in my school windows gives me an error. These are my libraries: # Basic Libraries import numpy as np import pandas as pd import seaborn as sb import…
2
votes
1 answer

How do I shift the contents in the first column into column names in Pandas

I god a dataFrame called df_tags, and I'd like to shift the whole dataframe by using all values of Tag column as column header name with their corresponding values as the first row of values. I have tried using df_tags.pop suggested here because my…
deLaJU
  • 45
  • 1
  • 8
2
votes
1 answer

During calculation of "distance average" in knn imputation method for replacing NaN value in particular column

I encounter this problem when I implement the Knn imputation method for handling missing data from scratch. I create a dummy dataset and find the nearest neighbors for rows that contain missing values here is my dataset A B C D …
2
votes
2 answers

Reordering rows in a dataframe to match order of rows in another dataframe

I have 2 dataframes df1 and df2 which have the same number of rows but in a different ordering. The dataframes have an ID column which contains a unique identifier for each row, both dataframes have this ID column and this is the column for which I…
Zein
  • 33
  • 7
2
votes
1 answer

Why PowerTransformer raises FloatingPointError by given non zero data

from sklearn.preprocessing import PowerTransformer transformer = PowerTransformer(method='yeo-johnson', standardize=True) arr = [330117.5, 651193.35, 364335.63, 2136036.01, 1184539.05, 1186871.87, 2310647.36, 860183.78, 237451.79, …
Tõnis
  • 21
  • 2
2
votes
2 answers

LabelEncoder().fit_transform gives me negative values?

Hei, I have different city names in the column "City" in my dataset. I would love to encode it using LabelEncoder(). However, I got quite frustrating results with negative values df['city_enc'] =…
2
votes
2 answers

Flatten all cells from float64 arrays to int in a Pandas dataframe

I have a Pandas DataFrame with 6 rows and 11 columns which contains a float64 array with a single value in each cell. The cells in the dataframe look like this: And this is what I get after transforming the dataframe to a dictionary: {'AO': {"W":…
Beg
  • 405
  • 1
  • 5
  • 18
2
votes
0 answers

Python: Create 5-6 groups from a dataset so that the groups are balanced across 3 different variables (decile, population size & region)

I was given already filtered datasets. The request is to create 5-6 equally sized groups that are balanced/stratified across 3 different variables. I have two datasets to do this for, one with about 540 rows and the other with about 880 rows. The…
2
votes
1 answer

Differencies between OneHotEncoding (sklearn) and get_dummies (pandas)

I am wondering what is the difference between pandas' get_dummies() encoding of categorical features as compared to the sklearn's OneHotEncoder(). I've seen answers that mention that get_dummies() cannot produce encoding for categories not seen in…
2
votes
2 answers

How to solve sklearn error: "Found input variables with inconsistent numbers of samples"?

I have a challenge using the sklearn 70-30 division. I receive an error on line: X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y) The error is: Found input variables with inconsistent numbers of…
Paip
  • 21
  • 1
  • 3
2
votes
1 answer

How can I convert the StandardScaler() transformation back to dataframe?

I'm working with a model, and after splitting into train and test, I want to apply StandardScaler(). However, this transformation converts my data into an array and I want to keep the format I had before. How can I do this? Basically, I have: from…
dmmmmd
  • 79
  • 2
  • 6
2
votes
1 answer

Fill Pandas Column NaNs with numpy array values

Sorry if this question seems too for newbies but I've been looking for an answer I didn't find it. So, I have a dataset with lots of NaN values and I've been working on some regressions to predict those nulls, and since the prediction is given as a…
fega_zero
  • 125
  • 9
2
votes
1 answer

Stratified sampling into 3 sets considering unbalance

I have looked into Stratified sample in pandas, stratified sampling on ranges, among others and they don't assess my issue specifically, as I'm looking to split the data into 3 sets randomly. I have an unbalanced dataframe of 10k rows, 10% is…
Chris
  • 2,019
  • 5
  • 22
  • 67