2

I was trying to include the outside defined functions inside the class. I have made a lot of pandas manipulation functions and they run fine, but when I tried to add them in Class, I got the problem. Here is my outline:

MWE

import numpy as np
import pandas as pd

df = pd.DataFrame({'A': [1,2,3],'B':[10,20,30],
                   'C':[100,200,300],'D':[1000,2000,3000]})

@pd.api.extensions.register_dataframe_accessor("my")
class MyAccessor:
    def __init__(self, pandas_obj):
        self._obj = pandas_obj

    def add_1_2(self,col1,col2):
        df = self._obj
        return df[col1] + df[col2]
    
df.my.add_1_2('A','B')

This works (but I have already functions defined without self._obj)

# this works
def add_1_3(self,col1,col2,col3):
    df = self._obj
    return df[col1] + df[col3]

MyAccessor.add_1_3 = add_1_3

df.my.add_1_3('A','B','C')

My attempt

def temp_fn(df,col1,col2,mydict):
    print(mydict)
    return df[col1]+df[col2]

col1,col2,mydict = 'A','B',{'lang':'python'}

temp_fn(df,col1,col2,mydict) # calling directly using function works good

# now I want to include this function inside the Class
def make_class_fn(fn, *args, **kwargs):
    return fn(args[0]._obj, *args[1:], **kwargs)

MyAccessor.temp_fn = make_class_fn(temp_fn,df,col1,col2,mydict)

AttributeError: 'DataFrame' object has no attribute '_obj' 

Note

If I had only one function, I could copy the same function inside the Class and edit it there, but I have a large number of functions and that's not a great idea.

Resources

  • One upvote for your effort, and mentioning them clearly in question. :) – imxitiz Aug 03 '21 at 13:11
  • What is `add_1_2_dict`, that you passed to `make_class_fn` any existing function ? – Maurice Meyer Aug 03 '21 at 13:37
  • That's any general function, here it is `temp_fn` with 4 arguments but the function can have any number of arguments args and kwargs. –  Aug 03 '21 at 13:40

1 Answers1

1

Your conversion function simply needs to create a new function that calls the original function with self._obj as the first argument.

from functools import wraps


def temp_fn(df,col1,col2,mydict):
    print(mydict)
    return df[col1]+df[col2]


def make_method(f):
    @wraps(f)
    def _(self, col1, col2, mydict):
        return f(self._obj, col1, col2, mydict)
    return _

MyAccessor.temp_fn = make_method(temp_fn)

More generally, make_method can handle arbitrary arguments, as long as the first argument has a _obj attribute to pass on.

def make_method(f):
    @wraps(f)
    def _(self, *args, **kwargs):
        return f(self._obj, *args, **kwargs)
    return _

You could also let make_method perform the assignment for you.

def make_method(cls, f):
    @wraps(f)
    def _(self, *args, **kwargs):
        return f(self._obj, *args, **kwargs)
    setattr(cls, f.__name__, _)

make_method(MyAccessor, temp_fn)

or even make it a decorator (though if you have the definition to decorate, you can probably define the method directly just as easily).

def make_method(cls):
    def decorator(f):
        @wraps(f)
        def _(self, *args, **kwargs):
            return f(self._obj, *args, **kwargs)
        setattr(cls, f.__name__, _)
        return _
    return decorator


@make_method(MyAccessor)
def temp_fn(df, col1, col2, mydict):
    return df[col1] + df[col2]
chepner
  • 497,756
  • 71
  • 530
  • 681
  • This is great! Now I can add arbitrary functions to Pandas DataFrames. –  Aug 03 '21 at 13:50