Highest Voted 'sklearn-pandas' Questions

4

votes

2 answers

Shuffling Multi Column in data frame

i have a Data frame like this : 'a' 'b' 'c' 'd' 'e' 'f' 'hello.text' 1 2 'hello2.text' 2 10 'hello3.text' 5 8 'hello4.text' 8 15 now i need shuffle or…

asked Aug 19 '19 at 08:33

Mahdi Asiyabi

79
1
1
8

4

votes

1 answer

Pyspark Pandas_UDF erroring with Invalid argument, not a string or column

I created a Pandas UDF, which will input a dataframe, predict and output a dataframe on Primary_Key and Predictions. schema = StructType([StructField('primary_id', IntegerType()), StructField('prediction',…

pandas pyspark user-defined-functions sklearn-pandas

asked Jul 10 '19 at 00:47

Pawan Kalyan

51
5

4

votes

2 answers

How to encode a pandas.DataFrame column containing lists using Sklearn.preprocessing

I have a pandas df and some of the columns are lists with data in them and I would like to encode the labels within the lists. I get this error: ValueError: Expected 2D array, got 1D array instead: from sklearn.preprocessing import…

python python-3.x dataframe scikit-learn sklearn-pandas

asked Jun 23 '19 at 05:19

raceee

477
5
14

4

votes

3 answers

how how iloc[:,1:] works ? can any one explain [:,1:] params

What is the meaning of below lines., especially confused about how iloc[:,1:] is working ? and also data[:,:1] data = np.asarray(train_df_mv_norm.iloc[:,1:]) X, Y = data[:,1:],data[:,:1] Here train_df_mv_norm is a dataframe --

python-3.x pandas sklearn-pandas

asked May 26 '19 at 07:44

Abhishek

1,543
3
13
29

4

votes

2 answers

Pandas - Counting rows in a df to discover the survival rate each day

. Hello, guys! I have a dfA (Table A) containing the amount of days that some products have been available (days_survived). I need to count the number of products that were available each day in total (Table B). I mean, I need counting rows in dfA…

python pandas pandas-groupby sklearn-pandas

asked Jan 31 '19 at 06:20

Thaise

1,043
3
16
28

4

votes

2 answers

Too many _coef values for LogisticRegression in Pipeline

I'm making use of the sklearn-pandas DataFrameMapper in a sklearn Pipeline. In order to evaluate feature contribution in a feature union pipeline, I like to measure the coefficients of the estimator (Logistic Regression). For the following code…

scikit-learn logistic-regression sklearn-pandas coefficients

asked Jan 27 '19 at 12:59

Christopher

2,120
7
31
58

4

votes

1 answer

Text classification for logistic regression with pipelines

I am trying to use LogisticRegression for text classification. I am using FeatureUnion for the features of the DataFrame and then cross_val_score to test the accuracy of the classifier. However, I don't know how to include the feature with the free…

python machine-learning sklearn-pandas

asked Nov 25 '18 at 13:40

Paul K

123
7

4

votes

3 answers

statmodels OLS giving a TypeError in python

I am trying to fit a set of features to statsmodel's OLS linear regression model. I am adding a few features at a time. With the first two features, it works fine. But when I keep adding new features it gives me an error. Traceback (most recent call…

python python-3.x statsmodels sklearn-pandas

asked Nov 14 '18 at 19:16

akalanka

553
7
21

4

votes

2 answers

stratified sample with replacement in python

I have a Pandas DataFrame. I am trying to create a sample DataFrame with replacement and also stratify it. This allows me to replace: df_test = df.sample(n=100, replace=True, random_state=42, axis=0) However, I am not sure how to also stratify. …

python random sample sklearn-pandas

asked Oct 05 '18 at 22:50

pythonsandpandas

41
3

4

votes

1 answer

How to view cluster centroids for each iteration of n_init using skleans' KMeans

I am currently trying to view the created centroids(cluster centers) for each iteration of KMeans that is determined from each iteration of n_init. As of now I am able to view the final results but I would like to see these at each iteration so I am…

python k-means sklearn-pandas

asked Aug 05 '18 at 19:41

Tired_GradStudent

43
4

4

votes

1 answer

Linear fit to pandas.datetime64 values?

I have a dataframe with two columns (age, date) indicating the age of a person and the current date. I want to approximate the date of birth from that data. I thought to fit a linear model and find the interception with the, but it does not work out…

python pandas scikit-learn seaborn sklearn-pandas

asked Jun 22 '18 at 02:33

Soerendip

7,684
15
61
128

4

votes

1 answer

How to groupby() aggregate on multiple columns and rename the multi-index in Pandas 0.21+?

Code import pandas as pd df = pd.DataFrame({'A': [1, 1, 1, 2, 2], 'B': range(5), 'C': range(5)}) df1 = df.groupby('A').B.agg({'B': ['count','nunique'],'C': ['sum','median']}) df1.columns = ["_".join(x) for x…

python pandas pandas-groupby sklearn-pandas

asked Dec 06 '17 at 03:42

GeorgeOfTheRF

8,244
23
57
80

4

votes

1 answer

python scipy spearman correlations

I am trying to obtain the column names from the dataframe (df) and associate them to the resulting array produced by the spearmanr correlation function. I need to associate both the column names (a-j) back to the correlation value (spearman) and…

python pandas scipy sklearn-pandas pearson-correlation

asked Nov 25 '17 at 11:02

Kyle

387
1
5
13

4

votes

1 answer

scikit-learn : ValueError: not enough values to unpack (expected 2, got 1)

There is a check_array function for calculating mean absolute percentage error (MAPE) in the recent version of sklearn but it doesn't seem to work the same way as the previous version. import numpy as np from sklearn.utils import check_array def…

python python-3.x scikit-learn sklearn-pandas

asked Jul 18 '17 at 16:36

Desta Haileselassie Hagos

23,140
7
48
53

4

votes

2 answers

Constraint the sum of coefficients with scikit learn linear model

I am doing a LassoCV with 1000 coefs. Statsmodels did not seem to able to handle this many coefs. So I am using scikit learn. Statsmodel allowed for .fit_constrained("coef1 + coef2...=1"). This constrained the sum of the coefs to = 1. I need to do…

python machine-learning scikit-learn regression sklearn-pandas

asked Jun 27 '17 at 21:20

TChi

383
1
6
14

Questions tagged [sklearn-pandas]

Resources

Shuffling Multi Column in data frame

Pyspark Pandas_UDF erroring with Invalid argument, not a string or column

How to encode a pandas.DataFrame column containing lists using Sklearn.preprocessing

how how iloc[:,1:] works ? can any one explain [:,1:] params

Pandas - Counting rows in a df to discover the survival rate each day

Too many _coef values for LogisticRegression in Pipeline

Text classification for logistic regression with pipelines

statmodels OLS giving a TypeError in python

stratified sample with replacement in python

How to view cluster centroids for each iteration of n_init using skleans' KMeans

Linear fit to pandas.datetime64 values?

How to groupby() aggregate on multiple columns and rename the multi-index in Pandas 0.21+?

python scipy spearman correlations

scikit-learn : ValueError: not enough values to unpack (expected 2, got 1)

Constraint the sum of coefficients with scikit learn linear model