The core of my question is - can I use PROC SQL, PROC DATASETS or some other command to directly modify meta data of one data set, based on data in another dataset? If yes, can you tell me more about what code to use or what search terms to use in trying to figure it out?
Here is the background and detail - I am receiving research results data exports (Excel) from a research partner regularly, and they have also provided me a codebook in Excel. The results data files always contain the same fields, but their data tables need a lot of cleaning. I need to develop a process to transform the results data into a usable SAS dataset, including labeled variables and a syntax file that will create and apply user-defined formats. I want to do this as syntax-efficiently as possible. I would ideally like to ask SAS to change the metadata/variable attributes (the label, specifically) in the results dataset, based on the Variable Name * Variable Label combos found in the codebook dataset. One way of doing this, as I see it, is to import both Excel files into SAS and then use a Concatenate function to write syntax in the form of a new character variable, and then use the text of that new variable in the data step of syntax run on the results dataset. Specifically:
*Use concatenate function to create variable containing syntax text;
data codebook_edit;
set codebook;
format Syntax_Label $2000.;
Syntax_Label=cat("label ", trim(Field_Name), " = '", trim(Field_Label), "';");
run;
This then produces a character variable with rows that looks like this:
label race = 'Race Ethnicity';
label sex = 'Sex assigned at birth';
label gender = 'Gender identity';
I would then run this:
*Get syntax to label vars;
proc print data=codebook_edit noobs;
var Syntax_Label;
run;
*Apply labels;
data results_edit;
set results;
label race = 'Race Ethnicity';
label sex = 'Sex assigned at birth';
label gender = 'Gender identity';
run;
However, I'd like to know if there is a better way of doing this, since I have hundreds of variables and would rather not create such a long syntax file. For example, can I use PROC SQL or PROC DATASETS or some other command to directly modify meta data in the research results data set, based on data from the codebook and if yes, can you offer any tips? Thank you