2

I want to score a model on a pandas DataFrame and then create a column with that model's predictions in the same DataFrame. The new column should be named so that it references the appropriate model so this can be done multiple times with multiple models. I can do this in R using deparse and substitute like so:

df <- data.frame(a=1:5+rnorm(1:5), b=6:10+rnorm(1:5), y=11:15+rnorm(1:5))
ols <- lm(df$y ~ df$a + df$b)

func <- function(df, model){
  new_col <- paste0(deparse(substitute(model)), '_predictions')
  df[, new_col] <- predict(model, df)
  return(df)
}

func(df, ols)
         a         b        y ols_predictions
1 1.569142  7.735250 11.90998        12.99388
2 0.828704  4.468299 12.16632        12.01042
3 2.270323  8.135620 14.25781        13.51283
4 1.847564  9.602450 13.76106        13.46148
5 5.776140 10.723743 16.08072        16.19727

What would be the equivalent of this in Python?

Gaurav Bansal
  • 5,221
  • 14
  • 45
  • 91

1 Answers1

1

Probably this may help.

In your R code you can safely throw away deparse and get the same result:

new_col <- paste0(substitute(model), '_predictions')

Solution for Python:

import pandas as pd
from sklearn import datasets
from sklearn import linear_model

data = datasets.load_boston()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.DataFrame(data.target, columns=["MEDV"])["MEDV"]

lm = linear_model.LinearRegression()
ols = lm.fit(X,y)

def my_func(df, model):
    name = [k for k,v in globals().items() if v is model][1]
    new_col = name + "_predictions"
    df[new_col] = model.predict(X)
    return df

my_func(X, ols)

Result:

enter image description here

ATTN: Please, bear in mind that Python works differently than R and there could be several pointers to the same object in memory, which may potentially lead to errors.

Denis Rasulev
  • 3,744
  • 4
  • 33
  • 47
  • 2
    Just a note that using the 0 element instead of 1 like so `[k for k,v in globals().items() if v is model][0]` could be better. Models like `RandomForestRegressor` only have one element so trying `[1]` results in an error. – Gaurav Bansal Apr 08 '18 at 16:45