Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
5
votes
2 answers
How can I get the feature names from sklearn TruncatedSVD object?
I have the following code
import pandas as pd
import numpy as np
from sklearn.decomposition import TruncatedSVD
df = df = pd.DataFrame(np.random.randn(1000, 25), index=dates, columns=list('ABCDEFGHIJKLMOPQRSTUVWXYZ'))
def reduce(dim):
svd =…

m.awad
- 187
- 2
- 13
5
votes
2 answers
How to iterate over pandas DataFrameGroupBy and select all entries per grouped variable for specific column?
Let's assume, there is a table like this:
Id | Type | Guid
I perform on such a table the following operation:
df = df.groupby('Id')
Now I would like to iterate through first n rows and for each specific Id as a list print all the corresponding…

Server Khalilov
- 408
- 2
- 5
- 20
5
votes
4 answers
Loading sklearn model in Java. Model created with DNNClassifier in python
The goal is to open in Java a model created/trained in python with tensorflow.contrib.learn.learn.DNNClassifier.
At the moment the main issue is to know the name of the "tensor" to give in java on the session runner method.
I have this test code…

rjpg
- 134
- 1
- 11
5
votes
1 answer
LabelEncoder().fit_transform vs. pd.get_dummies for categorical coding
It was recently brought to my attention that if you have a dataframe df like this:
A B C
0 0 Boat 45
1 1 NaN 12
2 2 Cat 6
3 3 Moose 21
4 4 Boat 43
You can encode the categorical data automatically with…

Jonathan Bechtel
- 3,497
- 4
- 43
- 73
5
votes
1 answer
Pyspark user defined aggregate calculation on columns
I’m preparing data for input for a classifier in Pyspark. I have been using aggregate functions in SparkSQL to extract features such as average and variance. These are grouped by activity, name and window. Window has been calculated by dividing a…

other15
- 839
- 2
- 11
- 23
4
votes
1 answer
How to predict on a grouped DataFrame, using a dictionary of models, and return to original test DataFrame?
I have created a dictionary of regression models, indexed by values of group from a training dataset, d
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
d =…

langtang
- 22,248
- 1
- 12
- 27
4
votes
7 answers
how to resolve AttributeError: module 'graphviz.backend' has no attribute 'ENCODING'
I am not sure why I get an AttributeError: module 'graphviz.backend' has no attribute 'ENCODING' when I tried to export regression tree to graphviz. I tried re-installing graphviz and sklearn but it doesn't solve the problem. Appreciate any advice…

Rayner
- 41
- 1
- 1
- 2
4
votes
1 answer
GridSearchCV results heatmap
I am trying to generate a heatmap for the GridSearchCV results from sklearn. The thing I like about sklearn-evaluation is that it is really easy to generate the heatmap. However, I have hit one issue. When I give a parameter as None, for…

spockshr
- 372
- 2
- 14
4
votes
0 answers
Sklearn pipeline not fitted after .fit has been called?
I have a simple pipeline like this
pl = Pipeline(steps=[("preprocessor", ColumnTransformer(
transformers=[
('num', Pipeline(steps=[('StandardScaler', StandardScaler())]),…

L Xandor
- 1,659
- 4
- 24
- 48
4
votes
3 answers
Create my custom Imputer for categorical variables sklearn
I have a dataset with a lot of categorical values missing and i would like to make a custom imputer which will fill the empty values with a value equal to "no-variable_name".
For example if a column "Workclass" has a Nan value, replace it with "No…

Vasilis Iak
- 79
- 7
4
votes
2 answers
group by and calculate auc on folds
What I would like to do, based on the dataset below, is to calculate the AUC for each algorithm and also later for each dataset. I have tried something like this but it is not working:
from sklearn.metrics import…

glouis
- 541
- 1
- 7
- 22
4
votes
2 answers
How do I use Decision Tree Regressor on new data? (Python, Pandas, Sklearn)
I've started learning python and machine learning very recently. I have been doing a basic Decision Tree Regressor example involving house prices. So I have trained the algorithm and found the best number of branches but how do I use this on new…

ARH94
- 43
- 6
4
votes
1 answer
Increase performance of Random Forest Regressor in sklearn
There is an optimization problem where I have to call the predict function of a Random Forest Regressor several thousand times.
from sklearn.ensemble import RandomForestRegressor
rfr = RandomForestRegressor(n_estimators=10)
rfr = rfr.fit(X, Y)
for…

Bowers
- 836
- 8
- 20
4
votes
1 answer
How can I set the font of the caption of a Pandas Datafrane?
I trying to display two tables side-by-side in a Jupyter notebook.
I have some code that does this:
header = ["Metric", "Test dataset"]
table1 = [["accuracy", accuracy_test],
["precision", precision_test],
…

user274610
- 509
- 9
- 18
4
votes
3 answers
statsmodels raises TypeError: ufunc 'isfinite' not supported for the input types
I am applying backward elimination using statsmodels.api and the code gives this error `TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule…

Anjali
- 187
- 1
- 4
- 12