There are probably a dozen good ways to do this, which one being ideal depending on various details of your data - in particular, how performance sensitive this is.
The most SASsy way to do this, I would say, is to use PROC SURVEYSELECT. This generates a random sample of the size you want, and then merges it on. It is not the fastest way, but it is very easy to understand and is fast-ish as long as you aren't talking humungous data sizes.
data _null_;
set sashelp.cars nobs=nobs_cars;
call symputx('nobs_cars',nobs_Cars);
stop;
run;
proc surveyselect data=sashelp.class sampsize=&nobs_Cars out=names(keep=name)
seed=7 method=urs outhits outorder=random;
run;
data want;
merge sashelp.cars names;
run;
In this example, we are taking the dataset sashelp.cars
, and appending an owner's name to each car, which we choose at random from the dataset sashelp.class
.
What we're doing here is first determining how many records we need - the number of observations in the to-be-merged-to dataset. This step can be skipped if you know that already, but it takes basically zero time no matter what the dataset size.
Second, we use proc surveyselect
to generate the random list. We use method=urs
to ask for simple random sampling with replacement, meaning we take 428 (in this case) separate pulls, each time every row being equally likely to be chosen. We use outhits
and outorder=random
to get a dataset with one row per desired output dataset row and in a random order (without outhits
it gives one row per input dataset row, and a number of times sampled variable, and without outrandom
it gives them in sorted order). sampsize
is used with our created macro variable that stores the number of observations in the eventual output dataset.
Third, we do a side by side merge (with no by
statement, intentionally). Please note that in some installations, options mergenoby
is set to give a warning or error for this particular usage; if so you may need to do this slightly differently, though it is easy to do so using two set
statements (set sashelp.cars; set names;
) to achieve the identical results.