For a project i manipulate a few columns of the dataset and afterwards join these newly created columns back to the entire dataset and then summarize on the manipulated fields.
The manipulation and merging is no problem, but the groupby feature doesn't return me any results. I'm wondering how i can find out why it doesn't return me anything. It loads the code and then the result is printed in Jupyter notebook, which only includes the columns i requested but 0 rows returned.
Is there any limitation in columns when using the groupby feature? - I'm using 40 groupby columns and 10 fields amount fields to summarize.
Are there alternativeswhich i can try? - I've came across some methods using numpy, which might be more effecient in memory. But couldn't really see an efficient way to solve this for 40 columns.
I have searched online, but i couldn't find any answer. I'm new to pandas, so before i would do a deepdive into this topic, i just want to consult if i'm overlooking something or if there is an easier way to achieve what i want.
Because the dataframe has over 40 columns to group by and around 10 value fields, i have included these in two lists objects. This was the first hurdle i concequered thanks to the following stackoverflow page.
These list are then used in the groupby feature.
#A way i tried solving this, due to the limitation of only 9 variables if you enter them in your groupby functionality.
groupcolumns = ['aa','ab','ac','ad'] #etc
amountcolumns = ['z1', 'z2', 'z3', 'z4'] #etc
df1 = df.groupby(groupcolumns)[amountcolumns].sum
df1.reset_index()
I would expect that it would return a DataFrame which is summarized on the groupcolumns for the amount columns.
Would be great if anyone can help me out! Thanks in advance.