0

I am working in SAS Enterprise guide and want to combine three separate data sets into one data set. All data sets have one column with the same length, data type and column header:

My data looks as follows (three tables, 1 column each):

data1         data2        data3

apples        apples       apples
100           200          300
55            77           80
3422          32432        1234 

I want my output data to look as follows (one table, 3 columns). The column headers should be changed to a name of my choice:

output_data

green_apples  red_apples   yellow_apples
100           200          300
55            77           80
3422          32432        1234 

I've messed around with SAS merge statements and proc_sql, but from what I've seen I always need to add some common identifier (for BY statements or something of the like).

Is there a fast way of doing what I want?

Thanks up front for the help.

Martin Reindl
  • 989
  • 2
  • 15
  • 33

2 Answers2

1
data want;
  merge a(rename=apples=green_apples) 
        b(rename=apples=red_apples) 
        c(rename=apples=yellow_apples)
  ;
run;

That will do as you say, merge the datasets together with no by statement. There is an option (options mergenoby=ERROR|WARN|NOWARN) which will possibly make this statement error or produce a warning (as this is something that often is done by mistake), but you can turn it to NOWARN to avoid that.

There isn't a good way to do this in SQL without creating an artificial row identifier.

Joe
  • 62,789
  • 6
  • 49
  • 67
  • This makes sense, but it doesn't allow me to rename the column headers. – Martin Reindl Feb 28 '17 at 19:32
  • Also, you might want to mention whether this is a one time sort of thing, or something you want to do in a reproducible, modular way. There are more complex solutions that present that are more appropriate for something you need to be flexible, but are overkill for just a one off situation. – Joe Feb 28 '17 at 19:53
  • This is just a one time sort of thing, so this is perfect. Thanks so much! – Martin Reindl Feb 28 '17 at 20:52
0

One quick and dirty solution to match exactly what you've posted would be:

data output_data;
    set data1 (rename=(apples = green_apples));
    set data2 (rename=(apples = red_apples));
    set data3 (rename=(apples = yellow_apples));
run;

Keep in mind that this solution would be limited by the number of observations in the smallest dataset.

However, it seems like you wish to match the records by the order they appear in the datasets? If so, an identifier field that described the order would be helpful (e.g. if the datasets had differing observation counts). One way to do this is to use the _N_ automatic variable:

/*
    Changing observation counts
*/
data data1;
    id = _N_;
    input apples 8.;
    cards;
100
55
3422
;
run;

data data2;
    id = _N_;
    input apples 8.;
    cards;
200
77
32432
1
2
;
run;

data data3;
    id = _N_;
    input apples 8.;
    cards;
300
80
1234 
5
;
run;

data output_data;
    merge data1 (in=d1 rename=(apples = green_apples))
          data2 (in=d2 rename=(apples = red_apples))
          data3 (in=d3 rename=(apples = yellow_apples));
    by id;
    if d1 or d2 or d3;
run;

Here, the dataset option (in=d1... creates a temporary variable d1 and sets it to 1 if a record from data1 matches a record from either data2 or data3 by id.

Hugs
  • 543
  • 2
  • 8
  • `if d1 or d2 or d3` is pointless, that's the default. (How would a record appear in the output_data dataset and not come from one of the three datasets?) – Joe Feb 28 '17 at 19:51
  • And the `set` option is one possible solution, but you should definitely mention the major difference between that and merge - how SAS would handle datasets of different sizes in that. – Joe Feb 28 '17 at 19:51
  • Good points. The `if` statement and `in` option were meant to show how SAS thinks, and based on the question it seemed like the OP needed some background. Also, the `set` solution would be limited by the number of observations in the _first_ dataset listed. – Hugs Feb 28 '17 at 19:56
  • I'd recommend putting that in the answer, then, as opposed to in comments. – Joe Feb 28 '17 at 20:35
  • Actually - you have one thing wrong, I misread your comment. The number of observations is not limited by the *first* dataset. It's limited by the *smallest* dataset. (So if d1 and d2 have 100 but d3 has 50, you'd get 50 out.) When any `set` statement fails to retrieve a row, the data step ends. – Joe Feb 28 '17 at 22:57