0

In Stata I want to calculate the sums of some columns and divide the results by the sum of another column.

My data is as follows:

Spi pop1 pop2 pop3 pop4 total pop
Spi1 15 10 0 5 30
Spi2 15 15 13 7 50
Spi3 16 17 0 0 33
Spi4 0 15 13 6 34

The results I want are the percentages of each population:

  • percentage of pop1
  • percentage of pop2
  • percentage of pop3
  • percentage of pop4

The solution I tried is the command

total(pop1 pop2 pop3 pop4)
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
Saidbf
  • 11
  • 3

2 Answers2

1

It's a bit unclear what format of the data set you want in the end. For example, if you want the percentages merged back to the original data or not. But this example does the different things I think you might want to do.

* Example generated by -dataex-. For more info, type help dataex
clear
input str4 spi byte(pop1 pop2 pop3 pop4 totalpop)
"Spi1" 15 10  0 5 30
"Spi2" 15 15 13 7 50
"Spi3" 16 17  0 0 33
"Spi4"  0 15 13 6 34
end

local popvars pop1 pop2 pop3 pop4

preserve
    
    * Calculate the sums
    collapse (sum) `popvars' totalpop
    
    * Generate the percentages
    foreach popvar of local popvars {
        gen `popvar'_percent = `popvar' / totalpop
    }
    
    * Keep only percentages var
    keep *_percent
    
    * Generate a var in using data to merge m:1 on
    gen mergevar = 1
    
    tempfile pop_percentages
    save    `pop_percentages' 
    
restore

* Generate a var in master data to merge m:1 on, the drop that var after merge
gen mergevar = 1
merge m:1 mergevar using `pop_percentages' , nogen
drop mergevar


Nick Cox
  • 35,529
  • 6
  • 31
  • 47
TheIceBear
  • 2,912
  • 9
  • 23
0

I agree with @TheIceBear. Tje question is ambiguous. Here is another interpretation.

clear
input str4 spi byte(pop1 pop2 pop3 pop4 totalpop)
"Spi1" 15 10  0 5 30
"Spi2" 15 15 13 7 50
"Spi3" 16 17  0 0 33
"Spi4"  0 15 13 6 34
end

forval j = 1/4 { 
    egen pc_pop`j' = pc(pop`j')
}

format pc* %2.1f 

list pc* 

     +---------------------------------------+
     | pc_pop1   pc_pop2   pc_pop3   pc_pop4 |
     |---------------------------------------|
  1. |    32.6      17.5       0.0      27.8 |
  2. |    32.6      26.3      50.0      38.9 |
  3. |    34.8      29.8       0.0       0.0 |
  4. |     0.0      26.3      50.0      33.3 |
     +---------------------------------------+

And here's a simpler way to get what @TheIceBear's code gives you, except that I show percents, not proportions.

su totalpop, meanonly

scalar grand_total = r(sum)

forval j = 1/4 {
    su pop`j', meanonly
    gen pop_pc`j'= 100  *  r(sum) / grand_total
}

format pop_pc? %2.1f

list pop_pc?

     +---------------------------------------+
     | pop_pc1   pop_pc2   pop_pc3   pop_pc4 |
     |---------------------------------------|
  1. |    31.3      38.8      17.7      12.2 |
  2. |    31.3      38.8      17.7      12.2 |
  3. |    31.3      38.8      17.7      12.2 |
  4. |    31.3      38.8      17.7      12.2 |
     +---------------------------------------+
Nick Cox
  • 35,529
  • 6
  • 31
  • 47