14

I find Hadley's plyr package for R extremely helpful, its a great DSL for transforming data. The problem that is solves is so common, that I face it other use cases, when not manipulating data in R, but in other programming languages.

Does anyone know if there exists an a module that does a similar thing for python? Something like:

def ddply(rows, *cols, op=lambda group_rows: group_rows):
    """group rows by cols, then apply the function op to each group
       and return the results aggregating all groups
       rows is a dict or list of values read by csv.reader or csv.DictReader"""
    pass

It shouldn't be too difficult to implement, but would be great if it already existed. I'd implement it, I'd use itertools.groupby to group by cols, then apply the op function, then use itertools.chain to chain it all up. Is there a better solution?

rafalotufo
  • 3,862
  • 4
  • 25
  • 28

1 Answers1

8

This is the implementation I drafted up:

def ddply(rows, cols, op=lambda group_rows: group_rows): 
    """group rows by cols, then apply the function op to each group 
    rows is list of values or dict with col names (like read from 
    csv.reader or   csv.DictReader)"""
    def group_key(row):                         
        return (row[col] for col in cols)
    rows = sorted(rows, key=group_key)
    return itertools.chain.from_iterable(
        op(group_rows) for k,group_rows in itertools.groupby(rows, key=group_key)) 

Another step would be to have a set of predefined functions that could be applied as op, like sum and other utility functions.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
rafalotufo
  • 3,862
  • 4
  • 25
  • 28
  • The `operator` module may be handy for your premade functions. – Daenyth Jun 22 '11 at 15:36
  • 4
    If you could write this on top of the pandas python module then you might win the internet (in my eyes, at least) – Mike Dewar Nov 26 '11 at 13:12
  • @MikeDewar: Is there a pandas implementation of this? –  Mar 09 '13 at 02:18
  • I use Pandas' pivot_table function http://pandas.pydata.org/pandas-docs/stable/reshaping.html – KLDavenport Oct 30 '13 at 17:38
  • I tried this and it did not work. I am just trying to reduce a frame into its group, and creating a list containing the grouped columns. Let's say I have a data frame with the meals I eat over the each day of a week over the entire month. I wish I could group it per week day, and make a list of all the meals, e.g. Monday: [eggs, milk, pasta]. How could I do this in python? – Eduardo Reis Nov 16 '18 at 18:36