-1

I would like to sum values for a set of observations, for a specific column based on a specific identifier. For example, suppose I have data like the below

A 4 5 6
B 3 3 2
A 3 4 2
C 3 2 0
B 3 7 3
B 2 4 1

Suppose I want to sum all of the values by the identifier in column 1, so I have totals for A, B and C by a specific column of my choice (2, 3 or 4). Then in a separate data set just output either A, B or C that is beyond a certain criteria.

For example, I want the sums of column 4 (based on identifier in column 1) which are above the value of 1, my output data set should return A = 8, B = 6, and obviously nothing for C as its zero.

I am open to Proc SQL, or the data step, in fact it would be useful to know both methods.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Ab Di
  • 11
  • 1
  • 1
  • 1
    What have you tried? Read [How to ask a good question](https://stackoverflow.com/help/how-to-ask). – philipxy Aug 17 '14 at 21:26

3 Answers3

1

This should do the trick, add a where statement after the from for your criteria.

proc sql ;
  create table output as
  select Column1, sum(column2) as ColSum2, sum(column3) as ColSum3,
         sum(column4) as ColSum4
  from your_data_table_here
  group by 1
  order by 1
  ;
quit ;
DTS
  • 423
  • 2
  • 13
1

You can use proc sql to groupby and sum over columns. Assuming you have data with identifying columns col1, col2, col3 and a value column value:

proc sql;
        create table SUMS as
        select COL1, COL2, COL3, sum(VALUE) as VALUE_SUM
        from YOUR_TABLE
        group by COL1, COL2, COL3;
quit;

For the Python people among you, this would be analogous to something in Pandas like df.groupby(['col1', 'col2', 'col3'])['value'].sum().rename('value_sum').reset_index().

ifly6
  • 5,003
  • 2
  • 24
  • 47
0

You can use sql - groupby - sum aggregate function

Kay
  • 367
  • 3
  • 11