5

I have a national survey composed of many variables, like this one (for the sake of semplicity I omitted some variables):

year  id  y.b   sex   income  married   pens   weight
2002  1   1950   F    100000     1       0      1.12
2002  2   1943   M    55000      1       1      0.55
2004  1   1950   F    88000      1       1      1.1
2004  2   1943   M    66000      1       1      0.6
2006  3   1966   M    12000      0       1      0.23
2008  3   1966   M    24000      0       1      0.23
2008  4   1972   F    33000      1       0      0.66
2010  4   1972   F    35000      1       0      0.67

Where id is the person interviewed, y.b is year of birth, married is a dummy (1 married, 0 single), pens is a dummy that takes value one if the person invest in a complementary pension form; weight are the survey weights.

Consider that the original survey is made up to 40k observations from 2002 to 2014(I filtered it in order to have only individuals that appear more than one time). I use this command to create a survey object:

d.s <- svydesign(ids=~1, data=df, weights=~weight)

Now that the df is weighted I want to find for example the percentage of women or the percentage of married person that invest in complementary pension; I read on R help and on the web to find a command to get the percentage but I didn't find the right one.

double-beep
  • 5,031
  • 17
  • 33
  • 41
Laura R.
  • 99
  • 1
  • 10
  • So that percentage is `number of women that invest in complementary pension/total number of women`, right? The same for married people. What code do you have so far? – blacksite Oct 06 '16 at 14:12
  • 1
    Right @not_a_robot. I used **svytable(~woman+obs, d.s)**, where obs is the total number of observation (I created a variable obs with a sequence of number from 1 to the end); I also used **svymean(~woman, d.s)** and **svyratio(~donna, ~obs, d.s)** but I didn't get what I needed. – Laura R. Oct 06 '16 at 14:37

2 Answers2

6
# same setup
library(survey)

df <- data.frame(sex = c('F', 'M', 'F', 'M', 'M', 'M', 'F', 'F'),
                married = c(1,1,1,1,0,0,1,1),
                pens = c(0, 1, 1, 1, 1, 1, 0, 0),
                weight = c(1.12, 0.55, 1.1, 0.6, 0.23, 0.23, 0.66, 0.67))

d.s <- svydesign(ids=~1, data=df, weights=~weight)

# subset to women only then calculate the share with a pension
svymean( ~ pens , subset( d.s , sex == 'F' ) )
Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
  • This is why I love SO, gem of an answer helping me years later – Vaibhav Singh Jul 13 '19 at 05:15
  • @Anthony, Although I have a doubt, if I were to use this in a function. It fails to work ,, please check below table1 <- function(table_1,var1,var2) { female <- svymean( make.formula(var1) , subset( table_1 , var2 == "F" ) ) male <- svymean( make.formula(var1) , subset( table_1 ,var2 == "M" ) ) total <- svymean( make.formula(var1) , table_1) return(cbind(female,male,total)) } ; table1(d.s, "pens", "sex") – Vaibhav Singh Jul 13 '19 at 05:30
0

I don't exactly know what you want to do with weight, but here is a very simple solution for the proportion of women with a pension in dplyr:

df <- data.frame(sex = c('F', 'M', 'F', 'M', 'M', 'M', 'F', 'F'),
                married = c(1,1,1,1,0,0,1,1),
                pens = c(0, 1, 1, 1, 1, 1, 0, 0),
                weight = c(1.12, 0.55, 1.1, 0.6, 0.23, 0.23, 0.66, 0.67))

d.s <- svydesign(ids=~1, data=df, weights=~weight)

# data frame of women with a pension
women_with_pension <- d.s$variables %>%
  filter(sex == 'F' & pens == 1)

# number of rows (i.e. number of women with a pension) in that df
n_women_with_pension <- nrow(women_with_pension)

# data frame of all women
all_women <- d.s$variables %>%
  filter(sex == 'F')

# number of rows (i.e. number of women) in that df
n_women <- nrow(all_women)

# divide the number of women with a pension by the total number of women
proportion_women_with_pension <- n_women_with_pension/n_women

That will give you a basic proportion of women with a pension. Apply this same logic to obtain the percentage of married people who have a pension.

As far as the weight variable goes, are you trying to do a weighted proportion of some sort? In that case, you would sum the weight values for women in each class (with pension and all women), like this:

# data frame of women with a pension
women_with_pension <- d.s$variables %>%
  filter(sex == 'F' & pens == 1) %>%
  summarise(total_weight = sum(weight))

# number of rows (i.e. number of women with a pension) in that df
women_with_pension_weight = women_with_pension[[1]]

# data frame of all women
all_women <- d.s$variables %>%
  filter(sex == 'F') %>%
  summarise(total_weight = sum(weight))

# number of rows (i.e. number of women) in that df
all_women_weight <- all_women[[1]]

# divide the number of women with a pension by the total number of women
# 0.3098592 for this sample data
prop_weight_women_with_pension <- women_with_pension_weight/all_women_weight
blacksite
  • 12,086
  • 10
  • 64
  • 109
  • 1
    thank you, you're answer is the one I was looking for. I wanted to use weight in order to have a right representation of the sample (since the survey is conducted on a sample using survey weights should allowed to have a better representation of the whole population). – Laura R. Oct 06 '16 at 15:34
  • 1
    @LauraR. i am downvoting because this strategy of breaking into the survey objects is absurd. and doesn't allow users to calculate confidence intervals. see my answer – Anthony Damico Oct 07 '16 at 07:02