I have a dataset in the following format stored in a large tibble in R:
Yr | Gender | Number | Disease | Age | Population |
---|---|---|---|---|---|
1901 | M | 100 | A | 1-5 | 1500 |
1901 | F | 105 | A | 1-5 | 1400 |
1901 | M | 23 | A | 6-10 | 1300 |
1901 | F | 87 | A | 6-10 | 1400 |
1901 | M | 102 | B | 1-5 | 1500 |
1901 | F | 103 | B | 1-5 | 1400 |
1901 | M | 42 | B | 6-10 | 1300 |
1901 | F | 67 | B | 6-10 | 1400 |
The population is linked to each age bracket and gender within each year. I need to group by year and gender and then calculate (number/population) for each sex for each year (disease type isn't important).
I'm using this code:
group_by(Yr,Gender) %>% summarise(rate=sum(Number/sum(unique(Population))))
However I know this is giving me incorrect results as when I summate the M and F figures produced, they come to more than they should do each year (by almost double).
Can anyone point me in the right direction here, I think it's because there are differing population figures for genders and ages but I can't figure out how to overcome this issue?
EDIT
Thanks for the replies - In the output I would like the rate to be calculated using the sum of numbers per gender but over the total population for the year rather than the grouped gender population i.e total number (per gender)/total population(for that year).