3

I have a problem at work: I have merged two datasets, and there is a number of variables which have the same content, but where an observation which has an value in the variable from dataset 1 have a missing-value in dataset 2. So I need to transfer the values from the one variable into the other one.

This is my best shot so far:

replace V23=1 if V232==1
replace V23=2 if V232==2
replace V23=3 if V232==3
replace V23=4 if V232==4
replace V23=8 if V232==8
replace V23=.u if V232==10 | V232==9

However, it is a tedious task to do that for 40+ variables - and since some of them are numerical variables, it becomes a a sisyphean task.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
emilBeBri
  • 625
  • 13
  • 18

1 Answers1

2

Here's a start:

foreach v of varlist v23 {
    local w `v'2
    replace `v' = `w' if missing(`v')
    replace `v' = .u if `w' == 10 | `w' == 9
}

Notice how this solution relies on a lexical relationship among the variable names: it assumes the old variable "v23" is associated with the new variable "v232". You can make a list of such associations and use it, but this is inconvenient. It's probably easier to rename the variables, if necessary, to conform to such a convention, then run the replacement script, and then restore the desired names.

If you're unfamiliar with this kind of automation, read the help pages for macro and foreach.

whuber
  • 2,379
  • 14
  • 23
  • You sir, are brilliant! I still miss some comprehension in order to understand this fully. It worked for one variable, but the following two (one numerical, one categorial like the first worked) didn't go so well. What I don't understand is: 1) What is a lexical relationship? it is most often used in semantics, and I can't transfer that knowledge to programming. 2) I don't understand the local w `v'2 line at all. I think this coincides with how I don't get in which way you tie the desired variable with the values "w" (in the above case V232) to the variable V23 with the values v. – emilBeBri Jul 20 '11 at 08:26
  • @Emil w is obtained by concatenating the character "2" after v. E.g., when v is "xyz", then w will be "xyz2". – whuber Jul 20 '11 at 13:04
  • Came around to tell you I figured that out myself just now! Only took like 3 hours =). I got it now, Thanks a million! – emilBeBri Jul 20 '11 at 14:48
  • It seems like the easiest way to do it, rather than renaming all the variables, is to just write the variabel name inside the loop. So for the next pair of variables, where the local macro, if unchanged, applies itself to the wrong variabel, because the next set of variables is "V24" and "V233", I write "local w V234" instead of "local w `v'2". At least it seems that way to me. And thanks again! – emilBeBri Jul 20 '11 at 15:25
  • @Emil This is what I meant by making a list of associations. It's not as simple as one "local" statement, unfortunately. A good approach is via the 'word' function. You will likely find this to be such a pain that you will come to appreciate the work-around of renaming the variables in a systematic way. (Moreover, when there is no systematic lexical relationship among names of related variables, you greatly increase the risk of making wholesale mistakes in data processing. I shudder at the thought of working with a dataset where "v24" and "v233" are so closely related...) – whuber Jul 20 '11 at 20:03
  • Luckily, there is a lexical relationship -if you by that mean that the scale the is same in the two variables. I ended up renaming all them like you suggested. And now I have an idea how to use loops and macros as well, it's a wonderful new world opening up for me =) – emilBeBri Jul 22 '11 at 09:42