Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
13
votes
3 answers

Why shouldn't the sklearn LabelEncoder be used to encode input data?

The docs for sklearn.LabelEncoder start with This transformer should be used to encode target values, i.e. y, and not the input X. Why is this? I post just one example of this recommendation being ignored in practice, although there seems to be…
hlud6646
  • 399
  • 2
  • 10
13
votes
3 answers

difference between LinearRegression and svm.SVR(kernel="linear")

First there are questions on this forum very similar to this one but trust me none matches so no duplicating please. I have encountered two methods of linear regression using scikit's sklearn and I am failing to understand the difference between the…
13
votes
4 answers

feature_names must be unique - Xgboost

I am running the xgboost model for a very sparse matrix. I am getting this error. ValueError: feature_names must be unique How can I deal with this? This is my code. yprob = bst.predict(xgb.DMatrix(test_df))[:,1]
user2728024
  • 1,496
  • 8
  • 23
  • 39
13
votes
2 answers

How to do Onehotencoding in Sklearn Pipeline

I am trying to oneHotEncode the categorical variables of my Pandas dataframe, which includes both categorical and continues variables. I realise this can be done easily with the pandas .get_dummies() function, but I need to use a pipeline so I can…
Desiré De Waele
  • 152
  • 1
  • 1
  • 10
13
votes
1 answer

Adding pandas columns to a sparse matrix

I have additional derived values for X variables that I want to use in my model. XAll = pd_data[['title','wordcount','sumscores','length']] y = pd_data['sentiment'] X_train, X_test, y_train, y_test = train_test_split(XAll, y, random_state=1) As I…
Bonson
  • 1,418
  • 4
  • 18
  • 38
13
votes
1 answer

How to change particular column value when defined mask is true?

I have a dataframe in which I have these column names 'team1', 'team2', 'city', 'date'. What I want to do is to assign value of 'city' as 'dubai' when certain condition meets(which I am defining using mask). This is what I am doing exactly: …
Pankaj Mishra
  • 550
  • 6
  • 18
12
votes
4 answers

Standardize some columns in Python Pandas dataframe?

Python code below only return me an array, but I want the scaled data to replace the original data. from sklearn.preprocessing import StandardScaler df = StandardScaler().fit_transform(df[['cost', 'sales']]) df output array([[ 1.99987622,…
BigData
  • 397
  • 2
  • 3
  • 13
12
votes
3 answers

Sklearn error : predict(x,y) takes 2 positional arguments but 3 were given

I am working on building a multivariate regression analysis on sklearn , I did a thorough look at the documentation. When I run the predict() function I get the error : predict() takes 2 positional arguments but 3 were given X is a data frame , y…
GD_N
  • 153
  • 1
  • 2
  • 13
12
votes
1 answer

What's the difference between sklearn Pipeline and DataFrameMapper?

Sklearn Pipeline: http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html DataFrameMapper: https://github.com/paulgb/sklearn-pandas What's the difference between them? It seems to me that sklearn pipeline has more features,…
nkhuyu
  • 840
  • 3
  • 9
  • 23
12
votes
3 answers

HOW TO LABEL the FEATURE IMPORTANCE with forests of trees?

I use sklearn to plot the feature importance for forests of trees. The dataframe is named 'heart'. Here the code to extract the list of the sorted features: importances = extc.feature_importances_ indices =…
ElenaPhys
  • 443
  • 2
  • 5
  • 16
11
votes
2 answers

Appending arrays to dataframe (python)

So I ran a time series model on a small sales data set, and forecasted sales for next 12 periods. With the following code: mod1=ARIMA(df1, order=(2,1,1)).fit(disp=0,transparams=True) y_future=mod1.forecast(steps=12)[0] where df1 contains the…
IndigoChild
  • 842
  • 3
  • 11
  • 29
11
votes
1 answer

Difference between model score() vs r2_score

I am training a LinearRegression() classifier and trying to gauge its prediction accruacy from sklearn.metrics import r2_score from sklearn.linear_model import LinearRegression regr_rf = LinearRegression() regr_rf.fit(df[features],df['label']) y_rf…
David
  • 4,634
  • 7
  • 35
  • 42
11
votes
2 answers

how to search a string value within a specific column in pandas dataframe, and if present, give an output of that row present in the dataframe?

I wish to search a database that I have in a .pkl file. I have loaded the .pkl file and stored it in a variable named load_data. Now, I need to accept a string input using raw input and search for the string in one specific column 'SMILES' of my…
Devarshi Sengupta
  • 121
  • 1
  • 1
  • 4
11
votes
8 answers

Error when trying to import sklearn modules : ImportError: DLL load failed: The specified module could not be found

I tried to do the following importations for a machine learning project: from sklearn import preprocessing, cross_validation, svm from sklearn.linear_model import LinearRegression I got this error message: Traceback (most recent call last): File…
11
votes
4 answers

GridSearchCV: "TypeError: 'StratifiedKFold' object is not iterable"

I want to perform GridSearchCV in a RandomForestClassifier, but data is not balanced, so I use StratifiedKFold: from sklearn.model_selection import StratifiedKFold from sklearn.grid_search import GridSearchCV from sklearn.ensemble import…
user183897
  • 111
  • 1
  • 1
  • 4
1
2
3
88 89