Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
5
votes
2 answers

How can I get the feature names from sklearn TruncatedSVD object?

I have the following code import pandas as pd import numpy as np from sklearn.decomposition import TruncatedSVD df = df = pd.DataFrame(np.random.randn(1000, 25), index=dates, columns=list('ABCDEFGHIJKLMOPQRSTUVWXYZ')) def reduce(dim): svd =…
m.awad
  • 187
  • 2
  • 13
5
votes
2 answers

How to iterate over pandas DataFrameGroupBy and select all entries per grouped variable for specific column?

Let's assume, there is a table like this: Id | Type | Guid I perform on such a table the following operation: df = df.groupby('Id') Now I would like to iterate through first n rows and for each specific Id as a list print all the corresponding…
Server Khalilov
  • 408
  • 2
  • 5
  • 20
5
votes
4 answers

Loading sklearn model in Java. Model created with DNNClassifier in python

The goal is to open in Java a model created/trained in python with tensorflow.contrib.learn.learn.DNNClassifier. At the moment the main issue is to know the name of the "tensor" to give in java on the session runner method. I have this test code…
rjpg
  • 134
  • 1
  • 11
5
votes
1 answer

LabelEncoder().fit_transform vs. pd.get_dummies for categorical coding

It was recently brought to my attention that if you have a dataframe df like this: A B C 0 0 Boat 45 1 1 NaN 12 2 2 Cat 6 3 3 Moose 21 4 4 Boat 43 You can encode the categorical data automatically with…
Jonathan Bechtel
  • 3,497
  • 4
  • 43
  • 73
5
votes
1 answer

Pyspark user defined aggregate calculation on columns

I’m preparing data for input for a classifier in Pyspark. I have been using aggregate functions in SparkSQL to extract features such as average and variance. These are grouped by activity, name and window. Window has been calculated by dividing a…
other15
  • 839
  • 2
  • 11
  • 23
4
votes
1 answer

How to predict on a grouped DataFrame, using a dictionary of models, and return to original test DataFrame?

I have created a dictionary of regression models, indexed by values of group from a training dataset, d import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.pipeline import Pipeline d =…
langtang
  • 22,248
  • 1
  • 12
  • 27
4
votes
7 answers

how to resolve AttributeError: module 'graphviz.backend' has no attribute 'ENCODING'

I am not sure why I get an AttributeError: module 'graphviz.backend' has no attribute 'ENCODING' when I tried to export regression tree to graphviz. I tried re-installing graphviz and sklearn but it doesn't solve the problem. Appreciate any advice…
Rayner
  • 41
  • 1
  • 1
  • 2
4
votes
1 answer

GridSearchCV results heatmap

I am trying to generate a heatmap for the GridSearchCV results from sklearn. The thing I like about sklearn-evaluation is that it is really easy to generate the heatmap. However, I have hit one issue. When I give a parameter as None, for…
spockshr
  • 372
  • 2
  • 14
4
votes
0 answers

Sklearn pipeline not fitted after .fit has been called?

I have a simple pipeline like this pl = Pipeline(steps=[("preprocessor", ColumnTransformer( transformers=[ ('num', Pipeline(steps=[('StandardScaler', StandardScaler())]),…
L Xandor
  • 1,659
  • 4
  • 24
  • 48
4
votes
3 answers

Create my custom Imputer for categorical variables sklearn

I have a dataset with a lot of categorical values missing and i would like to make a custom imputer which will fill the empty values with a value equal to "no-variable_name". For example if a column "Workclass" has a Nan value, replace it with "No…
4
votes
2 answers

group by and calculate auc on folds

What I would like to do, based on the dataset below, is to calculate the AUC for each algorithm and also later for each dataset. I have tried something like this but it is not working: from sklearn.metrics import…
glouis
  • 541
  • 1
  • 7
  • 22
4
votes
2 answers

How do I use Decision Tree Regressor on new data? (Python, Pandas, Sklearn)

I've started learning python and machine learning very recently. I have been doing a basic Decision Tree Regressor example involving house prices. So I have trained the algorithm and found the best number of branches but how do I use this on new…
ARH94
  • 43
  • 6
4
votes
1 answer

Increase performance of Random Forest Regressor in sklearn

There is an optimization problem where I have to call the predict function of a Random Forest Regressor several thousand times. from sklearn.ensemble import RandomForestRegressor rfr = RandomForestRegressor(n_estimators=10) rfr = rfr.fit(X, Y) for…
Bowers
  • 836
  • 8
  • 20
4
votes
1 answer

How can I set the font of the caption of a Pandas Datafrane?

I trying to display two tables side-by-side in a Jupyter notebook. I have some code that does this: header = ["Metric", "Test dataset"] table1 = [["accuracy", accuracy_test], ["precision", precision_test], …
user274610
  • 509
  • 9
  • 18
4
votes
3 answers

statsmodels raises TypeError: ufunc 'isfinite' not supported for the input types

I am applying backward elimination using statsmodels.api and the code gives this error `TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule…
Anjali
  • 187
  • 1
  • 4
  • 12