I have a dataset consisting of 1800000 rows and 45 columns the operation that I am trying to perform is group by one column, the sum of other columns
the 1st step I did is considering data_df as my data frame and all the columns are numeric
columns= data_df.column_names
df_result = df.groupby(columns,agg='sum')
the result is Kernal getting restarted
the RAM of the system is 32 GB
another approach that I tried
df=None
for col in colm:
print("the col is ",col)
if df is None:
df= data_df.groupby(data_df.MSISDN, agg=[vaex.agg.sum(col)])
else:
dfTemp= data_df.groupby(data_df.MSISDN, agg=[vaex.agg.sum(col)])
df =df.join(dfTemp,left_on="MSISDN",right_on ="MSISDN",how ="inner",allow_duplication=True)
del dfTemp
here I am able to find the sum up to 11 columns then the kernel gets restarted again is there any other way to get the results using vaex or pandas ?