I would like to run splits on csv files in unix and run aggregates on some columns. I want to group by on several columns if possible on each of the split up files using awk.
Does anyone know some unix magic that can do this?
here is a sample file:
customer_id,location,house_hold_type,employed,income
123,Florida,Head,true,100000
124,NJ,NoHead,false,0
125,Florida,NoHead,true,120000
126,Florida,Head,true,72000
127,NJ,Head,false,0
I want to get counts grouping on location, house_hold_type as well as AVG(income) for the same group by conditions.
How can I split a file and run awk with this?
this is the output I expect the format of the output could be different but this is the overall data structure I am expecting. Will humbly accept other ways of presenting the information:
location:[counts:['Florida':3, 'NJ':2], income_avgs:['Florida':97333, 'NJ':0]]
house_hold_type:[counts:['Head':3, 'NoHead':2], income_avgs:['Head':57333, 'NoHead':60000]]
Thank you in advance.