0

My dataset contains multiple variables called avar_1 to bvar_10 referring to the history of an individual. For some reasons, the history is not always complete and there are some "gaps" (e.g. avar_1 and avar_4 are non-missing, but avar_2 and avar_3 are missing). For each individual, I want to store the first non-missing value in a new variable called var1 the second non-missing in var2 etc, so that I have a history without missing values.

I've tried the following code

local x=1
foreach wave in a b {
    forval i=1/10 {
        capture drop var`x' 
        generate var`x'=.
        capture replace var`x'=`wave'var`i' if !mi(`wave'`var'`i')
        if (!mi(var`x')) {
            local x=1+`x'
            }
    }
}

var1 is generated properly but var2 only contains missings and following variables are not generated. However, I set trace on and saw that the var2 is actually replaced for all variables from avar_1 to bvar_10.

My guess is that the local x is not correctly updated as its value change for the whole dataset but should be different for each observation.

Is that the problem and if so, how can I avoid it?

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
Matthuit
  • 23
  • 3
  • Some inconsistency in your question on whether underscores `_` are part of your variable names and where they appear. I've assumed in editing that the real dataset is consistent. Your code block appears to need underscores. – Nick Cox Dec 28 '20 at 17:04

1 Answers1

2

A concise concrete data example is worth more than a long explanation. Your description seems consistent with an example like this:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str1 id float(avar_1 avar_2 avar_3 bvar_1 bvar_2)
"A" 1 . 6 8 10
"B" 2 4 . 9  .
"C" 3 5 7 . 11
end

* 4 is specific to this example. 
rename (bvar_*) (avar_#), renumber(4)

reshape long avar_, i(id) j(which)
(note: j = 1 2 3 4 5)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        3   ->      15
Number of variables                   6   ->       3
j variable (5 values)                     ->   which
xij variables:
               avar_1 avar_2 ... avar_5   ->   avar_
-----------------------------------------------------------------------------

drop if missing(avar_)
bysort id (which) : replace which = _n
list, sepby(id)

     +--------------------+
     | id   which   avar_ |
     |--------------------|
  1. |  A       1       1 |
  2. |  A       2       6 |
  3. |  A       3       8 |
  4. |  A       4      10 |
     |--------------------|
  5. |  B       1       2 |
  6. |  B       2       4 |
  7. |  B       3       9 |
     |--------------------|
  8. |  C       1       3 |
  9. |  C       2       5 |
 10. |  C       3       7 |
 11. |  C       4      11 |
     +--------------------+

Positive points:

Your data layout cries out for some structure given by a rename and especially by a reshape long. I don't give here code for a reshape wide as for the great majority of Stata purposes, you'd be better off with this layout.

Negative points:

!mi(var`x')

returns whether the first value of a variable is not missing. If foo were a variable in the dataset, !mi(foo) is evaluated as !mi(foo[1]). That is not what you want here. See https://www.stata.com/support/faqs/programming/if-command-versus-if-qualifier/ for the full story.

I'd recommend more evocative variable names.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47