I have a dataset unique across 5 variables. The 5th is an identifier variable. Finally, I have a 6th variable, which is dependent on the identifier variable.
The identifier variable can appear in multiple places. The dependent variable will never change for a given value of the identifier variable.
I have code such as the following:
proc sql;
select
...
, count(distinct identifier) as n_ids
from
group by
Which selects the number of unique identifiers per group of 4 independent variables. I'm hoping to add on to this the sum of the 6th variable, which would be something like the following:
sum(case when distinct identifier then dependent_var else 0 end)
Which obviously does not work (and for good reason). Any clean way of finding this sum within the sql step?