0

I have appended multiple files into a single sSata dataset. It has now 335 variable names. Some variable names have casing issues like almirah and ALMIRAH storing the same information from different datasets.

I am replacing these variables like this one by one:

count if mi(almirah)
local first=r(N)

count if mi(ALMIRAH)
local sec=r(N)

if first<sec {
    replace almirah=ALMIRAH if mi(almirah)
}
else {

}

How do I program this for all variables which are the same variable in essence but have upper and lower case issues like this?

Ataullah
  • 31
  • 7
  • Please read [How to create high quality reproducible examples in Stata](https://meta.stackoverflow.com/questions/377015/). We cannot be of much help if we cannot reproduce the problem. –  Dec 04 '18 at 10:03

3 Answers3

1

Suppose you have frog toad newt and FROG TOAD NEWT. Let's decide that the variable with lower case name is definitive. So, a loop with some or all of this may be helpful.

foreach v in frog toad newt { 
    local V = upper("`v'") 
    generate `v'2 = cond(missing(`v'), `V', `v') 
    display  
}

I have created a new variable there because there may be other problems. If there are, overwriting your data may obscure what they are.

Note: In your code segment you need at least

 if `first' < `sec'

to make it legal, as references to first and sec will be interpreted as references to variables or scalars otherwise. But it's really not clear why the numbers of missing values are material at all. If I have 42 observations, then append 66 more, the result should be the same as the other way round.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • The variables names are not just in upper case. They occur in lower, upper any combination. For example schoolname SchoolName latitude Latitude almirah ALMIRAH – Ataullah Dec 04 '18 at 09:58
  • So, you need more complicated code in which you pair off the names yourself. – Nick Cox Dec 04 '18 at 10:00
  • Thanks for the help though. The code above is a good start. Will further work on it. Appreciate the help – Ataullah Dec 04 '18 at 10:01
1

From your description, I guess a good choice for you is lowering all variable names before appending data. If this guess is correct, below code might give you a hint.

clear
save output, emptyok replace

foreach file in file1 file2 file3 file4 {
    use `file', clear
    ren *, lower
    append using output
    save output, replace
}
Romalpa Akzo
  • 599
  • 1
  • 4
  • 12
0

Had a similar situation this is how I resolved it.

quietly ds
local dbvars = "`r(varlist)'"

foreach v in `dbvars' { 
local V = lower("`v'") 

// test if var exists
capture confirm variable `V'_synced 
if !_rc {
  di in red "`V'_synced  exists"
  replace `V'_synced  = cond(missing(`v'), `V', `v')  if missing(`V'_synced )
     }
  else {
     di in red "`V'_synced  does not exist"
  gen `V'_synced  = cond(missing(`v'), `V', `v') 
      }

}

// keep combined variables 
keep *_synced 

24thDan
  • 113
  • 1
  • 9
  • Your first three lines could just be `foreach v of var * {`. – Nick Cox Jul 18 '22 at 14:40
  • This code is dangerous insofar as e.g. `frog` is the result of `lower()` on many possible variable names, namely `FROG FROg FRog Frog fROG fROg fRog` and so on. So, it's looking for one-to-one correspondences when Stata's rules permit many-to-one correspondences, at least for variable names two or more characters long. – Nick Cox Jul 18 '22 at 14:52
  • thanks for the code suggestion, although how would the many-to-one correspondence be implemented? – 24thDan Jul 18 '22 at 19:18
  • It's a consequence of your code, not a goal usually desirable in itself. – Nick Cox Jul 19 '22 at 06:42