-1

I have a dataset that looks like the following:

  year      Gender        VP
                
  2023      Total           3                  
  2023       F              2            
  2023       M              1            
  2022      Total           6                 
  2022       F              4                  
  2022       M              2               
  2021      Total           2                
  ...

What I want to do is generate a variable VP_per that gives me the female percentage of VPs for every year.

Basically something that looks like

bysort year gender : gen VP_per = VP if gender==1 / VP if gender==2

where 1 is the numeric value for Female and 2 is the numeric value for Total.

greybeard
  • 2,249
  • 8
  • 30
  • 66
Shubhi
  • 1
  • 1

1 Answers1

0

Your data layout -- in which subtotals are held in separate observations -- is not especially suitable for most Stata purposes, but it can be accommodated.

Here is some technique, which can be modified for numeric variables with value labels.

clear 
input int Year str5 Gender byte VP
2023 "Total" 3
2023 "F"     2
2023 "M"     1
2022 "Total" 6
2022 "F"     4
2022 "M"     2
end

* this hinges on sorting F M Total 
bysort Year (Gender) : gen wanted = 100 * VP[1] / VP[3] 

* need to modify for numeric variables 
egen num = total(cond(Gender == "F", VP, .)), by(Year)
egen den = total(cond(Gender == "Total", VP, .)), by(Year) 
gen wanted2 = 100 * num/den 

If Gender were numeric with 1 for Female and 2 for Total

egen num = total(cond(Gender == 1, VP, .)), by(Year)
egen den = total(cond(Gender == 2, VP, .)), by(Year) 
Nick Cox
  • 35,529
  • 6
  • 31
  • 47