0

I have a function that takes an indexed pandas.Series of things and a dataframe of stuff that I want to use the things on group by group. It is common for the dataframe to contain groups for which there is no matching thing, so a simple list comprehension will often throw exceptions.

My Python is pretty rusty. Is code like this considered normal? It strikes me that it might be replicating some library function that I should use instead. Or just that it might be bad practice compared to some other way. My justification for the local function is that it has no use aside from being a helper function for the list comprehension, so why make it global?

def use_things(things, stuff_to_use_things_on):
    stuff_grouped = stuff_to_use_things_on.groupby(things.index.names)

    # find and use the right thing, or else move on
    # local closure since this function has no real use outside this context
    def use_thing(name, group):
        try:
            return things.loc[(name)].do_something(group)
        except:
            return None

    # stuff might contain groups that there is no thing for
    results = [use_thing(name, group) for (name, group) in stuff_grouped]
    
    return pd.concat(results)
jtolle
  • 7,023
  • 2
  • 28
  • 50

1 Answers1

1

I believe what you are looking for is either the apply_map() or apply() method of dataframes. Documentation for apply() can be found here, while documentation for apply_map() can be found here. If you want to apply a function across all elements of a dataframe, you should use apply_map(). If you want to only apply a function across an axis, use apply(). Keep in mind that these methods cannot be performed in place.

  • Thanks for the answer, but wouldn't I wind up with the same problem when using apply - I need to supply a function that can handle exceptions? So I'd still be inclined to use the local function like here, and would have the same question about whether it's considered proper Python. – jtolle May 08 '22 at 02:19
  • But I get the point about apply being more or less what I'm doing - grouping a dataframe, calling a function with each group, and then compiling the results. I used the loop instead because it wasn't clear how to get the group identifying keys in order to lookup the right "thing" to use when calling a function via apply. – jtolle May 08 '22 at 03:06
  • I would check out this [answer](https://stackoverflow.com/a/48838706/19064880). I wouldn't put a function within another function. I would think it to be more proper to use `apply()` instead. – PlasmaRaptor360 May 08 '22 at 03:35