I have ~250,000 rows of firm-specific annual data(2000-2019) with and industry SIC code for each firm. The aim is to sum the value in each variable column for every individual SIC code based on the year. The data looks like this for the first couple of rows:
>head(compustat)
gvkey datadate fyear indfmt consol popsrc datafmt curcd at capx ceq emp ni revt xrd costat sic
1 1004 20000531 1999 INDL C D STD USD 740.998 22.344 339.515 2.9 35.163 1024.333 NA A 5080
2 1004 20010531 2000 INDL C D STD USD 701.854 13.134 340.212 2.5 18.531 874.255 NA A 5080
3 1004 20020531 2001 INDL C D STD USD 710.199 12.112 310.235 2.2 -58.939 638.721 NA A 5080
4 1004 20030531 2002 INDL C D STD USD 686.621 9.930 294.988 2.1 -12.410 606.337 NA A 5080
For the columns "at", "capx", "ceq", "emp", "ni", "revt", "xrd" I want the total sum for all firms with identical SIC codes for each year. So my output would be the total value of all variables within the same industry SIC, for every year between 2000 and 2019.
Could someone help me achieve this?
Thanks,