I find Hadley's plyr package for R extremely helpful, its a great DSL for transforming data. The problem that is solves is so common, that I face it other use cases, when not manipulating data in R, but in other programming languages.
Does anyone know if there exists an a module that does a similar thing for python? Something like:
def ddply(rows, *cols, op=lambda group_rows: group_rows):
"""group rows by cols, then apply the function op to each group
and return the results aggregating all groups
rows is a dict or list of values read by csv.reader or csv.DictReader"""
pass
It shouldn't be too difficult to implement, but would be great if it already existed. I'd implement it, I'd use itertools.groupby
to group by cols
, then apply the op
function, then use itertools.chain to chain it all up. Is there a better solution?