Creating specific PROC JSON hierarchy

Question

I am trying to create a JSON file using SAS Enterprise Guide (EG) in the following format:

{
"schema": "EMAIL_SHA26",
"data": ["1516e67afa2d9c3874c3e9874bdb41c4", "1a7e5f59b3f0dfe6ea152cb65aedb0d2"]
}

I am pretty close, but my resulting JSON file has a few too many brackets. Here is what I am currently getting:

{
"schema": "EMAIL_SHA26",
"data": [
  [
   "1516e67afa2d9c3874c3e9874bdb41c4"
  ],
  [
   "1a7e5f59b3f0dfe6ea152cb65aedb0d2"
  ]
 ]
}

And here is the code I am using:

PROC JSON 
    OUT = users NOSASTAGS PRETTY;
    WRITE VALUES "schema";
    WRITE VALUES "EMAIL_SHA26";
    WRITE VALUES "data";
    WRITE OPEN ARRAY;
    EXPORT users_ds / NOKEYS;
    WRITE CLOSE;
RUN;

The "users_ds" data set has one column with the 2 data records in it. Is there any way I can prevent it from putting brackets around each value in the data set? Furthermore, is my desired output achievable knowing that the list of hashed emails could be as large as 10,000 records?

Any help would be greatly appreciated.

What does your `users` data set look like? I'm assuming it has the users_ds item. You want a single file with all the data elements in that list? Hard coding is likely the fastest solution, but I think it' can be done via PROC JSON. — Reeza, Aug 02 '19 at 20:25

score 1 · Answer 1 · answered Aug 02 '19 at 20:47

The output you are getting indicates that there are multiple rows each with a different email.

To get the output you desire, you need to concatenate all of these emails into one long string and that will be problematic since SAS limits the length of each character variable.

Here is a work around. This basically used CALL EXECUTE to generate manual code that will generate the JSON you want:

data have;
  data = "1516e67afa2d9c3874c3e9874bdb41c4"; output; 
  data = "1a7e5f59b3f0dfe6ea152cb65aedb0d2"; output;
run;

data _null_;
 set have end=lastrec;

 if _N_ = 1 then do;
   call execute(
   "PROC JSON  OUT = 'want.json' NOSASTAGS PRETTY; 
    WRITE VALUES 'schema';
    WRITE VALUES 'EMAIL_SHA26';
    WRITE VALUES 'data';
    WRITE OPEN ARRAY;
   ");

 end;

 call execute('WRITE VALUES "' || data ||  '";');

 if lastrec then call execute("WRITE CLOSE;");

run;

This produces:

{
  "schema": "EMAIL_SHA26",
  "data": [
    "1516e67afa2d9c3874c3e9874bdb41c4",
    "1a7e5f59b3f0dfe6ea152cb65aedb0d2"
  ]
}

I definitely went down the "one long string" route and decided against it for obvious reasons. Based on your output, this looks like it would work as well, but I accepted the other answer for its simplicity. Not sure which answer would be more efficient. Thanks! — Nick Nelson, Aug 03 '19 at 16:23
I don't think one is more efficient than the other unless your dataset is very large in which case transpose can lead to some additional IO. But certainly agree @Richard 's solution is more elegant. — SAS2Python, Aug 03 '19 at 18:56

Richard · Accepted Answer · 2019-08-03T12:29:54.287

Export defaults to outputting a dataset as an array of objects of name:value pairs. With NOKEYS the construct is an array of arrays, with the inner array being an array of values.

In order to get a single array of values for a column, you can transpose the column into a single rowed data set and export that. You will not have to OPEN ARRAY before EXPORT.

data have;
  do row = 1 to 10;
    userid = uuidgen();
    age = 19 + row;
    output;
  end;
run;

* transpose a single column into a single row data set;
proc transpose data=have out=have_1_row(drop=_name_);
  var userid;
run;

filename users "C:\temp\users.json" ;

proc json out = users nosastags pretty;
    WRITE VALUES "schema";
    WRITE VALUES "EMAIL_SHA26";
    WRITE VALUES "data";
    EXPORT have_1_row / NOKEYS;
RUN;

Yields json

{
  "schema": "EMAIL_SHA26",
  "data": [
    "6ebd89fa-b6bc-4c14-b094-43792d202ad7",
    "ec53dd59-1290-47d7-b437-0c754349434c",
    "17332882-58ca-4c09-a599-2048d58460d0",
    "d5b57a19-ff73-4deb-bfc7-62ebc19d719e",
    "9d2758b2-e128-45df-8589-99cd7204c1ab",
    "a13bcba7-742f-4a01-bd56-dc12f4190d3e",
    "5f853bf3-9597-4c94-9b57-a54d3de190c3",
    "0edbd2d8-bd5d-46be-aaa7-ac208df4ba62",
    "07347e73-7efa-4e9c-8242-5a9c85f07b56",
    "03976b1b-513f-41ee-92d5-d23c8d3d4918"
  ]
}

For the case of wanting to EXPORT more than one column as an array of values, consider using DOSUBL to invoke a macro that side-runs the transposition and generates the single row data set used in a macro code generated EXPORT statement:

%macro transpose_column(data=, column=, out=);
  %* generate code that will transpose a single column into a single row data set;
  proc transpose data=&data out=&out(keep=col:);
    var &column;
  run;
%mend;

%macro export_column_as_array (data=, column=);
  %local rc out;
  %let out = _%sysfunc(monotonic());

  %* Invoke DOSUBL to side-run macro generated proc transpose code;
  %let rc = %sysfunc(
    DOSUBL(
      %transpose_column(data=&data, column=&column, out=&out)
    )
  );

  %* use the output data set created by the side-run code;
  WRITE VALUES "&column";
  EXPORT &out / NOKEYS;
%mend;

data have;
  do row = 1 to 10;
    userid = uuidgen();
    age = 19 + row;
    date = today() - row;
    output;
  end;
  format date yymmdd10.;
run;

filename users "C:\temp\users.json" ;

options mprint mtrace;

proc json out = users nosastags pretty;
    WRITE VALUES "schema";
    WRITE VALUES "EMAIL_SHA26";
    %export_column_as_array(data=have,column=userid);
    %export_column_as_array(data=have,column=age);
    %export_column_as_array(data=have,column=date);
run;
quit;

This was a very simple solution that worked for me. Wondering about efficiency of a data set with 1 row and 10,000 columns vs. the other solution that was suggested (which I haven't tried yet). Accepting this answer for now. Thanks! — Nick Nelson, Aug 03 '19 at 16:19
It is very efficient. You can see how fast by changing the loop in the data step that creates the sample data -- `do row = 1 to 10000;` — Richard, Aug 03 '19 at 16:30

Creating specific PROC JSON hierarchy

2 Answers2