Highest Voted 'sklearn-pandas' Questions

13

votes

3 answers

Why shouldn't the sklearn LabelEncoder be used to encode input data?

The docs for sklearn.LabelEncoder start with This transformer should be used to encode target values, i.e. y, and not the input X. Why is this? I post just one example of this recommendation being ignored in practice, although there seems to be…

python sklearn-pandas feature-engineering

asked Jan 25 '20 at 23:13

hlud6646

399
2
10

13

votes

3 answers

difference between LinearRegression and svm.SVR(kernel="linear")

First there are questions on this forum very similar to this one but trust me none matches so no duplicating please. I have encountered two methods of linear regression using scikit's sklearn and I am failing to understand the difference between the…

machine-learning scikit-learn regression python-3.5 sklearn-pandas

asked Oct 27 '17 at 08:40

Dev_Man

847
1
10
28

13

votes

4 answers

feature_names must be unique - Xgboost

I am running the xgboost model for a very sparse matrix. I am getting this error. ValueError: feature_names must be unique How can I deal with this? This is my code. yprob = bst.predict(xgb.DMatrix(test_df))[:,1]

python pandas xgboost sklearn-pandas

asked Apr 24 '17 at 03:07

user2728024

1,496
8
23
39

13

votes

2 answers

How to do Onehotencoding in Sklearn Pipeline

I am trying to oneHotEncode the categorical variables of my Pandas dataframe, which includes both categorical and continues variables. I realise this can be done easily with the pandas .get_dummies() function, but I need to use a pipeline so I can…

python scikit-learn pipeline sklearn-pandas

asked Feb 13 '17 at 12:38

Desiré De Waele

152
1
1
10

13

votes

1 answer

Adding pandas columns to a sparse matrix

I have additional derived values for X variables that I want to use in my model. XAll = pd_data[['title','wordcount','sumscores','length']] y = pd_data['sentiment'] X_train, X_test, y_train, y_test = train_test_split(XAll, y, random_state=1) As I…

python pandas scikit-learn sklearn-pandas

asked Jan 30 '17 at 01:11

Bonson

1,418
4
18
38

13

votes

1 answer

How to change particular column value when defined mask is true?

I have a dataframe in which I have these column names 'team1', 'team2', 'city', 'date'. What I want to do is to assign value of 'city' as 'dubai' when certain condition meets(which I am defining using mask). This is what I am doing exactly: …

pandas dataframe sklearn-pandas

asked Dec 29 '16 at 18:28

Pankaj Mishra

550
6
18

12

votes

4 answers

Standardize some columns in Python Pandas dataframe?

Python code below only return me an array, but I want the scaled data to replace the original data. from sklearn.preprocessing import StandardScaler df = StandardScaler().fit_transform(df[['cost', 'sales']]) df output array([[ 1.99987622,…

python pandas sklearn-pandas standardized

asked Apr 04 '18 at 02:14

BigData

397
2
3
13

12

votes

3 answers

Sklearn error : predict(x,y) takes 2 positional arguments but 3 were given

I am working on building a multivariate regression analysis on sklearn , I did a thorough look at the documentation. When I run the predict() function I get the error : predict() takes 2 positional arguments but 3 were given X is a data frame , y…

scikit-learn sklearn-pandas

asked Oct 03 '17 at 18:04

GD_N

153
1
2
13

12

votes

1 answer

What's the difference between sklearn Pipeline and DataFrameMapper?

Sklearn Pipeline: http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html DataFrameMapper: https://github.com/paulgb/sklearn-pandas What's the difference between them? It seems to me that sklearn pipeline has more features,…

scikit-learn pipeline sklearn-pandas

asked Oct 31 '16 at 23:45

nkhuyu

840
3
9
23

12

votes

3 answers

HOW TO LABEL the FEATURE IMPORTANCE with forests of trees?

I use sklearn to plot the feature importance for forests of trees. The dataframe is named 'heart'. Here the code to extract the list of the sorted features: importances = extc.feature_importances_ indices =…

python numpy matplotlib scikit-learn sklearn-pandas

asked Jun 17 '16 at 09:10

ElenaPhys

443
2
5
16

11

votes

2 answers

Appending arrays to dataframe (python)

So I ran a time series model on a small sales data set, and forecasted sales for next 12 periods. With the following code: mod1=ARIMA(df1, order=(2,1,1)).fit(disp=0,transparams=True) y_future=mod1.forecast(steps=12)[0] where df1 contains the…

python arrays pandas dataframe sklearn-pandas

asked Jan 24 '18 at 10:53

IndigoChild

842
3
11
29

11

votes

1 answer

Difference between model score() vs r2_score

I am training a LinearRegression() classifier and trying to gauge its prediction accruacy from sklearn.metrics import r2_score from sklearn.linear_model import LinearRegression regr_rf = LinearRegression() regr_rf.fit(df[features],df['label']) y_rf…

scikit-learn sklearn-pandas

asked Aug 06 '17 at 08:03

David

4,634
7
35
42

11

votes

2 answers

how to search a string value within a specific column in pandas dataframe, and if present, give an output of that row present in the dataframe?

I wish to search a database that I have in a .pkl file. I have loaded the .pkl file and stored it in a variable named load_data. Now, I need to accept a string input using raw input and search for the string in one specific column 'SMILES' of my…

loops pandas search sklearn-pandas

asked Jun 18 '17 at 18:05

Devarshi Sengupta

121
1
1
4

11

votes

8 answers

Error when trying to import sklearn modules : ImportError: DLL load failed: The specified module could not be found

I tried to do the following importations for a machine learning project: from sklearn import preprocessing, cross_validation, svm from sklearn.linear_model import LinearRegression I got this error message: Traceback (most recent call last): File…

python machine-learning dll sklearn-pandas

asked Dec 26 '16 at 15:00

Taha Abdelhalim Nakabi

123
1
1
6

11

votes

4 answers

GridSearchCV: "TypeError: 'StratifiedKFold' object is not iterable"

I want to perform GridSearchCV in a RandomForestClassifier, but data is not balanced, so I use StratifiedKFold: from sklearn.model_selection import StratifiedKFold from sklearn.grid_search import GridSearchCV from sklearn.ensemble import…

pandas scikit-learn grid-search sklearn-pandas

asked Oct 26 '16 at 08:38

user183897

111
1
1
4

Questions tagged [sklearn-pandas]

Resources

Why shouldn't the sklearn LabelEncoder be used to encode input data?

difference between LinearRegression and svm.SVR(kernel="linear")

feature_names must be unique - Xgboost

How to do Onehotencoding in Sklearn Pipeline

Adding pandas columns to a sparse matrix

How to change particular column value when defined mask is true?

Standardize some columns in Python Pandas dataframe?

Sklearn error : predict(x,y) takes 2 positional arguments but 3 were given

What's the difference between sklearn Pipeline and DataFrameMapper?

HOW TO LABEL the FEATURE IMPORTANCE with forests of trees?

Appending arrays to dataframe (python)

Difference between model score() vs r2_score

how to search a string value within a specific column in pandas dataframe, and if present, give an output of that row present in the dataframe?

Error when trying to import sklearn modules : ImportError: DLL load failed: The specified module could not be found

GridSearchCV: "TypeError: 'StratifiedKFold' object is not iterable"