jrnlfile
is a dataset with journal names and identifiers. Here are the first 6 obs:
id journal issn
56201 ACTA HAEMATOLOGICA 0001-5792
94365 ACTA PHARMACOLOGICA SINICA
10334 ACTA PHARMACOLOGICA SINICA 1671-4083
55123 ADVANCES IN ENZYME REGULATION 0065-2571
90002 AGING
10403 AGING 1945-4589
Compare id
94365 and 10334. These obs name the same journal
. They need the same issn
. An obs with a missing value for issn
almost always has at least one partner obs that contains a matching journal
name and the correct issn
. Wherever this is true, I want to recode the missing issn
so it contains the issn
seen in other instances where the same journal
is mentioned. A revised dataset want
would look like this:
id journal issn
56201 ACTA HAEMATOLOGICA 0001-5792
94365 ACTA PHARMACOLOGICA SINICA 1671-4083
10334 ACTA PHARMACOLOGICA SINICA 1671-4083
55123 ADVANCES IN ENZYME REGULATION 0065-2571
90002 AGING 1945-4589
10403 AGING 1945-4589
I currently use if-else statements in a data step to populate missing issn
values with matching entries for journal
:
data want;
set jrnlfile;
if journal = "ACTA PHARMACOLOGICA SINICA" then issn = "1671-4083";
else if journal = "AGING" then issn = "1945-4589";
/*continue for 7,000 other journals*/
run;
But jrnlfile
contains 50,000 obs and 7,000 unique journals, so this takes a lot of time and is rather error-prone. This answer gets me halfway there, but issn
is not numeric and I can't solve the problem by simply adding values to it.
What is a more efficient and systematic way to get to want
from jrnlfile
?