i want to merge two datasets, but without using merge statement and Proc sql,can i do this? is there any way to do the same
-
There are other ways, such as the HASH option suggested by @Shorack, an index key merge is another way. My question is why are you asking? You should be using the optimal solution for your task, however this sounds like a homework question rather than a genuine SAS coding problem – Longfish Sep 18 '13 at 08:09
-
Not so strange to ask: to choose the optimal solution, it's better to know all possibilities. – Sep 18 '13 at 08:44
-
Thanks to all , who responded to the query. – ved_null_ Sep 19 '13 at 05:21
3 Answers
Yes there is: a join using hash tables.
See this document for an example: http://www.nesug.org/proceedings/nesug06/dm/da07.pdf
Advantages:
- Faster in some cases
Disadvantages:
- One of the tables needs to fit in memory
- Syntax is very un-SAS-like (being closer to languages such as java)
- Not everybody is familiar with the concept, certainly novice SAS users (can be problem for maintenance)
In my opinion, hash join is only useful in a very limited set of use cases. One example is where following 2 conditions are met:
- You need to join information from a huge table with a small (easily fits in memory) table.
- The large table is not sorted on the join variables and there is no added value in having it sorted.
When the small table gets extremely small (e.g. only 10 key values), i would maybe consider some approach using 2 macro variables and 2 arrays. This because the code would be as performant and easier to recognize as SAS by other people who might come after me and need to maintain it.
Conclusion: judging from the way the question is framed, you should go for SAS data step merge or proc sql join.

- 1,272
- 9
- 11
Although not a merge in the truest sense, if you have one large dataset and one small dataset, you can read the smaller DS in as a format, and then use proc format
to put the values onto the larger one.

- 1,163
- 2
- 9
- 16
-
+1, it's something I use much. But sometimes I also use it together with `newvar=put(var,format.)` to create a new variable. – Sep 18 '13 at 08:47
-
@arbautjc i am quite intereted to know how to use proc format in merging two or more datasets. thanks for the new tip – ved_null_ Sep 19 '13 at 05:22
There are total 5 ways to Merge two datasets:
- Proc SQL
- Data Merge
- Proc Format
- Arrays
- Hash Objects
Selection of the process depends on the size of the datasets and the primary key used.
Proc Format and Hash Object has been proved as the best method to merge large datasets with lesser Run Time.

- 1