1

I have a dataset with many variables. For a number of observations I want to replace their values (in a large number of variables) with the values of another observation (for the same variables) based on a unique identifier. The unique identifier is not equal to the Stata observation numbers (_n).

This is similar to several other threads, but as far as I can tell, their solutions are not fully transferrable

  1. transfer values from one variable to another in Stata But I have no simple lexical relationship that I could use
  2. https://www.stata.com/statalist/archive/2013-06/msg00056.html But I can't rely on the Stata observation numbers, because the dataset is dynamic and the order and number of observations may change.

Here is a piece of code based on the solution from the second link but adjusted to my problem (I changed the content of the square bracket in the last macro). I am aware that this code does not work because the square bracket here could only contain a Stata obs number (_n). But I think this should illustrate well, what I am trying to achieve:

In this example, I would like to replace the values of the observations whose unique_id is 25 and 38 with the values of the observation whose unique_id is 21. This should be done for all variables in my local varlist.

set obs 50

local vlist v1 v2 v3 v4 v5

foreach v of local vlist { 
    generate `v' = runiform()
}

gen unique_id=_n+20  // this is just to illustrate that the unique_id is not equal to _n


foreach var of local vlist {
replace `var' =`var'[unique_id==21] if unique_id==25 | unique_id==38
}
Timo K
  • 29
  • 6
  • That syntax is legal but not what you want usually. `unique_id == 21` is evaluated as 1 if true and 0 if false. A reference to varname`[1]` will be what you want if and only if the data have been sorted previously so that `unique_id` is 21 in the first observation, so that subscript is correct. Otherwise put, Stata is not R in this respect! The syntax used is not a way to select a scalar – Nick Cox Apr 08 '21 at 10:44

1 Answers1

1

Here is one way to do it:

foreach var of local vlist {
    su `var' if unique_id == 21, meanonly 
    replace `var' = r(mean) if unique_id==25 | unique_id==38
}

And here is another way to do it:

gen reference = unique_id == 21 
sort reference 

foreach var of local vlist {
    replace `var' = `var'[_N] if unique_id==25 | unique_id==38
}

Of course if you looked at the data and worked out that at the moment identifier 21 is in observation 42, say, then using that as subscript that is a direct answer, but not good or robust style.

See also this paper for a review of some techniques in this territory.

I'd sympathise with anyone who thought that Stata style looks awkward for this problem. Also, this kind of question seems terribly ad hoc, but then problems like this are real too.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • Thanks so much, Nick! The second one of your solutions worked like a charm! Your first solution gave me a type mismatch error - I have yet to figure out why (my actual unique identifier variable is a numeric float variable just like in the example above and in my understanding it should not make a difference, what the data type of the vars in vlist v1-v5 is?). Probably you would need more info on my data to answer this one, but since the second alternative worked, I leave it at this. Also, that paper seems very useful, also to some colleagues who work on panel data a lot. Thanks again! – Timo K Apr 08 '21 at 13:00
  • Your question code _creates_ `v1` to `v5` as numeric. But if any of your actual variables is string, `summarize` won't barf but the mean is returned as missing and you can't `replace` any of its values by system missing, which is a numeric value. So, the code for the first solution depends on your question code, while the second solution is more general. Glad it helped. – Nick Cox Apr 08 '21 at 15:15
  • "returned as missing" is not quite right. But after `summarize` of a string variable if you attempt to use `r(mean)` it is evaluated as numeric system missing any way. – Nick Cox Apr 08 '21 at 15:26