3

What I am looking to do is to put the rules of slicing a pandas dataframe in a function.

For example:

row1 = {'a':5,'b':6,'c':7,'d':'A'}
row2 = {'a':8,'b':9,'c':10,'d':'B'}
row3 = {'a':11,'b':12,'c':13,'d':'C'}
df = pd.DataFrame([row1,row2,row3])

I am slicing the dataframe this way:

print df.loc[df['a']==5]
print df.loc[df['b']==12]
print df.loc[(df['b']==12) | df['d'].isin(['A','C']),'d']

For my purposes, I need to slice the same dataframe in different ways as part of a function. For example:

def slicing(locationargument):
    df.loc(locationargument)
    do some stuff..
    return something

Alternatively, I was expecting getattr() to work but that tells me DataFrames do not have a .loc[...] attribute. For example:

getattr(df,"loc[df['a']==5]")

Returns:

AttributeError: 'DataFrame' object has no attribute 'loc[df['a']==5]'

Am I missing something here? Any thoughts or alternatives would be greatly appreciated!

fpes
  • 964
  • 11
  • 22
  • Call like `slicing(df['a']==5)` and do `df.loc[locationargument]` inside of the function. Is that what you want? – Ashwini Chaudhary Mar 21 '15 at 16:55
  • Exactly - or of course, some alternative that fulfills similar functionality.. – fpes Mar 21 '15 at 16:58
  • Would it not be easier and make more sense to turn whatever `do something...` into a function that you perform on the slice? So you slice outside the function and pass the slice to the function? – EdChum Mar 21 '15 at 18:12
  • I was considering this as well, but I need to slice several dataframes in the same way, so I would need to do it multiple times. I was looking to have the function take the locationargument and apply it to all dataframes in one call. Thanks! – fpes Mar 21 '15 at 18:23

1 Answers1

0

In Pandas, I believe it's not quite right to think of .loc as a function (or method) on a DataFrame. For example, the syntax df.loc(...) is not right. Instead, you need to write df.loc[...] (brackets, not parentheses).

So how about simply:

def slicing(locationargument):
    df.loc[locationargument]
    do some stuff..
    return something

But then the question becomes "what type of object should locationargument be? If it's an iterable whose length is equal to the number of rows in your data frame, you're all set. An alternative could be to make it a string and then write:

def slicing(locationargumentstring):
    df.loc[eval(locationargumentstring)]
    do some stuff..
    return something

If you go the getattr route, remember that the attribute doesn't include parameters. So the following is bad:

getattr(df, "loc[df['a']==5]")

but the following would work:

getattr(df, "loc")[eval("df['a']==5")]

and, more directly, so would

getattr(df, "loc")[df['a']==5]
8one6
  • 13,078
  • 12
  • 62
  • 84
  • This definitely did work. It did make sense that .loc was not considered a method of DataFrame. I was so close. Now that I am seeing this, it seems there's more about the getattr() function than I know now. Time for some digging.. Thanks! – fpes Mar 22 '15 at 17:59
  • I'm definitely not an expert, but think of `x = getattr(foo, bar)` as equivalent to `x = foo.bar`. In particular, `bar` should be a string or something that evaluates to a string. And in that case `x` winds up being whatever kind of thing the `bar` attribute on `foo` was to begin with. So if `foo.bar` is a function, you can do `getattr(foo, bar)(arguments)` and if `foo.bar` is a float you can do `getattr(foo, bar) * 2`, etc. – 8one6 Mar 22 '15 at 18:04
  • And in the case of .loc in pandas (or any other indices for that matter, we can call [index] instead of (args) it seems? This is would think is generalizeable for any other Python object's attribute that takes a similar form.. – fpes Mar 22 '15 at 18:11