1

For example if I create a simple wrapper around a pandas dataframe:

    from pandas import DataFrame

    class MyDataFrame(DataFrame):

        def __init__(self,data):

            DataFrame.__init__(self,data)

        @staticmethod
        def add(a, b):
            return a + b

and then I instantiate it..

    x = MyDataFrame([1,2,3])

            x.add(1,2)
            #    => 3

            type(x)
            #    => __main__.MyDataFrame

it works. But, if I call a a dataframe method on it that returns a dataframe, It is no loger an instance of my wrapper class.

    y = x.reindex([3,4,5])
    type(y)
    #    => pandas.core.frame.DataFrame

How do I get it to return an instance of MyDataFrame for all DataFrame methods? Is this a common issue? Am I approaching this in the wrong way?

Sean Geoffrey Pietz
  • 1,120
  • 2
  • 12
  • 17
  • I thought that was normal behavior. The methods, unless overwritten just point to the place to get it. Saves memory. – jollarvia Mar 19 '14 at 22:09
  • 1
    Basically, you can't. It requires cooperation from the superclass. The superclass has to be written in such a way as to return an instance of "whatever class I am an instance of", but many classes (like DataFrame) are hard-coded to return an instance of only that particular class. – BrenBarn Mar 19 '14 at 22:12
  • 1
    you *can* overwrite the ``constructor`` property in the DataFrame class, but in general sub-classing DataFrames is not useful, try composition instead. – Jeff Mar 19 '14 at 22:22
  • @Jeff: Composition is not really any easier, since you would still have to override every magic method to make your object act like a DataFrame. The problem is that pandas is hard-coded to use only its own types, instead of parameterized. – BrenBarn Mar 20 '14 at 02:21
  • @BrenBarn not since 0.13; can be easily subclassed, just override the constructor property as I said. the point is the use car for actual inheritance is very narrow and most reasons are much better served by doing has-a cases – Jeff Mar 20 '14 at 03:14

2 Answers2

2

The example you have shown is not a wrapper but a subclass in Python. Now python subclasses and method resolution in your case behave by simple rules

  1. Look at the type of the receiver object of the method.
  2. Check the class hierarchy of the class and find the first instance the method is defined. Then look at the signature of that method and execute it accordingly. In your case, class hierarchy is simple subclass-superclass.

So, in your case,

  1. x is defined as an object of class MyDataFrame -- simple. Obviously, type(x) is MyDataFrame, by definition.
  2. During call of add, it looks at receiver object, which is x of class MyDataFrame. And this class in fact defines the method add. So, it simply returns the result of that method, Curiously, try calling DataFrame([1, 2, 3]).add(1, 2). The result will be different since it looks at the add method as defined in pandas.DataFrame class.
  3. Now comes the third part - Let's apply same reasoning. reindex is not defined in MyDataFrame. Where should we look next? Class hierarchy, that means pandas.DataFrame. Now reindex is indeed defined by this class and it returns a pandas.DataFrame object!. (See this: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.reindex.html#pandas.DataFrame.reindex) So, no wonder y is a pandas DataFrame.

Now I don't understand what you are doing by extending a pandas DataFrame in first place. Extending like this is not a common practice. If you provide details of what you want to do, maybe we can provide a solution.

Edit: Your original question is concerned with extension methods or extension objects (C# has them, and as you rightly pointed out, JS prototypes give you same functionality. Python does not have extension methods/objects as first class members. There has been discussion on that. e.g python extension methods)

Community
  • 1
  • 1
Sudeep Juvekar
  • 4,898
  • 3
  • 29
  • 35
  • 1
    Thanks, this helps a lot. I guess what I would like to achieve is be something like how in javascript I can add functionality to pre-existing objects by appending methods to the prototype class. Or,in clojure I can extend an existing type to implement a protocol where I have functions that will execute differently based on the data type they are passed. What is the idiomatic way to achieve this type of polymorphism in OO python? – Sean Geoffrey Pietz Mar 19 '14 at 22:22
0

There have been several cases in Pandas where the classes have not been implemented well to form the basis for derived classes. Some of these issues are fixed, e.g., https://github.com/pydata/pandas/pull/4271 and https://github.com/pydata/pandas/issues/60 .

It is possible to implement a parent reindex method so the result is a child subclass:

from pandas import DataFrame

class DF():
    def __init__(self, data):
        print('DF __init__')
        self.data = data
    def reindex(self, index):
        print('DF reindex')
        return self.__class__(self.data)
        # return DF(self.data)  # not like this!

class MyDF(DF):
    def __init__(self, data):
        DF.__init__(self, data)
    @staticmethod
    def add(a, b):
        return a + b


x = MyDF([1,2,3])

x.add(1,2)
#    => 3
type(x)

y = x.reindex([3,4,5])
type(y)

z = DF([1,2,3])
type(z.reindex([1, 2]))

In newer versions of Pandas the `_constructor' is set internally to control the type returned. Setting this class attribute seems to do the trick:

class MyDataFrame(DataFrame):
    def __init__(self, *args, **kwargs):
        DataFrame.__init__(self, *args, **kwargs)
    @staticmethod
    def add(a, b):
        return a + b

MyDataFrame._constructor = MyDataFrame

>>> type(y)
<class '__main__.MyDataFrame'>
Finn Årup Nielsen
  • 6,130
  • 1
  • 33
  • 43