2

I have a dataset which has to be rolled up based on the granularity(FIELD1 & FIELD2). Two of the metrics fields(METRIC1 & METRIC2) have to be summed up. Until now it seems to be an easy GROUP BY task. But I have a string field(FLAG) which has to be rolled up too, by concatenating the distinct values.

Input Dataset:

enter image description here

Expected Result:

enter image description here

This operation can be performed in Oracle using the LISTAGG() function. Kindly help me out in achieving the same in SAS Proc SQL.

Jonathan
  • 144
  • 4
  • 13
  • 1
    There is no PROC SQL option with this particular problem, or at least not directly. How important is that requirement? A data step is your best solution, though a transpose and data step or SQL would also work. Please post your data as text, not as an image. – Reeza Apr 16 '19 at 02:17

4 Answers4

2

I don't believe there's a direct way to do this in SAS. CATS (and similar concatenation functions) aren't aggregation functions. It was suggested to add these back a few years ago but nothing came of it that I'm aware of (see this thread.)

If I understand right, what you're doing is GROUP BY field1/field2, SUM metric1/metric2, and make a single FLAG field that concatenates all seen FLAG field values (but doesn't group by them).

The way I would handle this is to first do your aggregation (field1/field2), and then join that to a separate table that was just field1/field2/flag. You could make that most easily in the data step, something like:

data want;
  set have;
  by field1 field2;
  length flag_out $100; *or longer if you need longer;
  flag_out = catx(',',flag_out,flag);
  if last.field2 then output;
  rename flag_out=flag;
  drop flag;
run;

This assumes it's sorted already by field1/field2, otherwise you need to do that first.

Joe
  • 62,789
  • 6
  • 49
  • 67
1

As stated, there is no LISTAGG() function, and there is also no built-in feature for creating a custom aggregate function. However, there are two possibilities that get the output.

Example one

Data step with DOW processing and hashing for tracking distinct flag values while concatenating within a group.

data want;
  if 0 then set have; *prep pdv;
  length flags $200;

  declare hash _flags();
  _flags.defineKey('flag');
  _flags.defineDone();

  do until (last.f2);
    set have;
    by f1 f2;
    m1_sum = sum(m1_sum,m1);
    m2_sum = sum(m2_sum,m2);
    if _flags.find() ne 0 then do;
      _flags.add();
      flags = catx(',',flags,flag);
    end;
  end;

  drop m1 m2 flag;

  _flags.delete();
run;

Example two

Create a FCMP custom function used from within SQL. Because FCMP can not create an aggregate function, the result will be automatically remerged against original data which must then be filtered. The FCMP function also uses a hash for tracking distinct values of flag within a group.

proc fcmp outlib=sasuser.functionsx.package;

  function listagg(f1 $, f2 $, n, item $) $;
    length result $32000 index 8;

    static flag;
    static index;
    declare hash items();

    if flag = . then do;
      flag = 1;
      rc = items.defineKey('item');
      rc = items.defineDone();
    end;

    static items;

    index + 1;
    rc = items.replace();

    if index = n then do;
      declare hiter hi('items');
      result = '';
      do while (hi.next() = 0);
       result = catx(',',result,item);
      end;

      index = 0;
      rc = items.clear();

      return (result);
    end;
    else 
      return ("");

  endsub;
run;

options cmplib=sasuser.functionsx;

proc sql;
  create table wanted as 
  select * from 
  (
    select                      /* subselect is a remerge due to 'listagg' mimic */
      f1, 
      f2,
      listagg(f1, f2, count(*), flag) as flags,
      sum(m1) as m1,
      sum(m2) as m2
    from have
    group by f1, f2
  )
  where flags is not null                   /* filter the subselect */
;
quit;

Ideally a hash of hashes would have been used, but FCMP only provides for hash instance creation in a declare statement, and dynamic hashes can not be instantiated with _new_. SAS Viya users would be able to use the new component object Dictionary in an FCMP function, and might be able to have a Dictionary of Dictionaries to track the distinct flag values in each group.

Richard
  • 25,390
  • 3
  • 25
  • 38
0

Thanks everyone for your valuable inputs. Apparently there is no straightforward solution to this scenario in SAS. With the bigger picture of the requirement in mind, I have decided to tackle the problem at the data layer itself or add another intermediate presentation layer. I'm sure many have pointed out this need to SAS, I have raised this issue with SAS too. Hope they look into it and come up with a similar function as LISTAGG OR GROUP_CONCAT.

Jonathan
  • 144
  • 4
  • 13
  • Not sure why you say that as you got two very good answers for how to concatenate values across multiple observations. – Tom Apr 16 '19 at 23:53
0

Answer from Joe is very good but is missing one critical part. There should be a

retain flag_out;

line after the 'by' line.