Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
4
votes
2 answers

Shuffling Multi Column in data frame

i have a Data frame like this : 'a' 'b' 'c' 'd' 'e' 'f' 'hello.text' 1 2 'hello2.text' 2 10 'hello3.text' 5 8 'hello4.text' 8 15 now i need shuffle or…
Mahdi Asiyabi
  • 79
  • 1
  • 1
  • 8
4
votes
1 answer

Pyspark Pandas_UDF erroring with Invalid argument, not a string or column

I created a Pandas UDF, which will input a dataframe, predict and output a dataframe on Primary_Key and Predictions. schema = StructType([StructField('primary_id', IntegerType()), StructField('prediction',…
4
votes
2 answers

How to encode a pandas.DataFrame column containing lists using Sklearn.preprocessing

I have a pandas df and some of the columns are lists with data in them and I would like to encode the labels within the lists. I get this error: ValueError: Expected 2D array, got 1D array instead: from sklearn.preprocessing import…
raceee
  • 477
  • 5
  • 14
4
votes
3 answers

how how iloc[:,1:] works ? can any one explain [:,1:] params

What is the meaning of below lines., especially confused about how iloc[:,1:] is working ? and also data[:,:1] data = np.asarray(train_df_mv_norm.iloc[:,1:]) X, Y = data[:,1:],data[:,:1] Here train_df_mv_norm is a dataframe --
Abhishek
  • 1,543
  • 3
  • 13
  • 29
4
votes
2 answers

Pandas - Counting rows in a df to discover the survival rate each day

. Hello, guys! I have a dfA (Table A) containing the amount of days that some products have been available (days_survived). I need to count the number of products that were available each day in total (Table B). I mean, I need counting rows in dfA…
Thaise
  • 1,043
  • 3
  • 16
  • 28
4
votes
2 answers

Too many _coef values for LogisticRegression in Pipeline

I'm making use of the sklearn-pandas DataFrameMapper in a sklearn Pipeline. In order to evaluate feature contribution in a feature union pipeline, I like to measure the coefficients of the estimator (Logistic Regression). For the following code…
4
votes
1 answer

Text classification for logistic regression with pipelines

I am trying to use LogisticRegression for text classification. I am using FeatureUnion for the features of the DataFrame and then cross_val_score to test the accuracy of the classifier. However, I don't know how to include the feature with the free…
Paul K
  • 123
  • 7
4
votes
3 answers

statmodels OLS giving a TypeError in python

I am trying to fit a set of features to statsmodel's OLS linear regression model. I am adding a few features at a time. With the first two features, it works fine. But when I keep adding new features it gives me an error. Traceback (most recent call…
akalanka
  • 553
  • 7
  • 21
4
votes
2 answers

stratified sample with replacement in python

I have a Pandas DataFrame. I am trying to create a sample DataFrame with replacement and also stratify it. This allows me to replace: df_test = df.sample(n=100, replace=True, random_state=42, axis=0) However, I am not sure how to also stratify. …
4
votes
1 answer

How to view cluster centroids for each iteration of n_init using skleans' KMeans

I am currently trying to view the created centroids(cluster centers) for each iteration of KMeans that is determined from each iteration of n_init. As of now I am able to view the final results but I would like to see these at each iteration so I am…
4
votes
1 answer

Linear fit to pandas.datetime64 values?

I have a dataframe with two columns (age, date) indicating the age of a person and the current date. I want to approximate the date of birth from that data. I thought to fit a linear model and find the interception with the, but it does not work out…
Soerendip
  • 7,684
  • 15
  • 61
  • 128
4
votes
1 answer

How to groupby() aggregate on multiple columns and rename the multi-index in Pandas 0.21+?

Code import pandas as pd df = pd.DataFrame({'A': [1, 1, 1, 2, 2], 'B': range(5), 'C': range(5)}) df1 = df.groupby('A').B.agg({'B': ['count','nunique'],'C': ['sum','median']}) df1.columns = ["_".join(x) for x…
GeorgeOfTheRF
  • 8,244
  • 23
  • 57
  • 80
4
votes
1 answer

python scipy spearman correlations

I am trying to obtain the column names from the dataframe (df) and associate them to the resulting array produced by the spearmanr correlation function. I need to associate both the column names (a-j) back to the correlation value (spearman) and…
Kyle
  • 387
  • 1
  • 5
  • 13
4
votes
1 answer

scikit-learn : ValueError: not enough values to unpack (expected 2, got 1)

There is a check_array function for calculating mean absolute percentage error (MAPE) in the recent version of sklearn but it doesn't seem to work the same way as the previous version. import numpy as np from sklearn.utils import check_array def…
Desta Haileselassie Hagos
  • 23,140
  • 7
  • 48
  • 53
4
votes
2 answers

Constraint the sum of coefficients with scikit learn linear model

I am doing a LassoCV with 1000 coefs. Statsmodels did not seem to able to handle this many coefs. So I am using scikit learn. Statsmodel allowed for .fit_constrained("coef1 + coef2...=1"). This constrained the sum of the coefs to = 1. I need to do…
TChi
  • 383
  • 1
  • 6
  • 14