R survey weights standard-error

Question

I have some problems calculating SE in my survey. Here is an exemple of what I want to do and I have tried to use the survey package in R. (fpc in the example below equals number of observations in each strata)

Code to generate data:

id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) 
strata = c(6, 6, 6, 7, 7, 7, 8, 8, 8, 8, 8, 8) 
weight = c(60, 75, 85, 140, 170, 175, 270, 310, 325, 785, 1450, 3920) 
fpc = c(8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 6)
answer = c("2", "2", "3", "1", "2", NA, NA, 2, "3", NA, "1", NA)
df = data.frame(id, strata, weight, fpc, answer)
df <- df[complete.cases(df), ]

I then try to calculate the mean and SE using the survey package:

dstrat<-svydesign(id=~1,strata=~strata, weights=~weight, data=df, fpc=~fpc)
svymean(~answer, dstrat)

        mean    SE
answer1 0.60803 0.2573
answer2 0.23518 0.1755
answer3 0.15679 0.1479

My first question is: How can i take in to account the weights of the observations that did not answer in my study? In my example above I remove my NA observations before running the function but I would like to include this information. I assume that the SE will be bigger or smaller depending on if I have answers for the observations with the largest weights or not?

My second question is: How can I calculate a SE for a "net value"? Assume:

answer1 = good  
answer2 = neutral  
answer3 = bad

I can calculate the "net value" as answer1 - answer3 = 0.60803 - 0.15679 = 0.45124. How can I get the SE for this "net value"?

I'm voting to close this question as off-topic because it belongs to [stats.SE]. — double-beep, May 21 '19 at 12:27

score 4 · Accepted Answer · answered Jan 09 '16 at 19:08

4

your first question belongs on stats.stackexchange --but i think the answer is you cannot compute the SE when the data are missing. but here's how to solve for the SE of your second question:

library(survey)
id <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) 
strata <- c(6, 6, 6, 7, 7, 7, 8, 8, 8, 8, 8, 8) 
weight <- c(60, 75, 85, 140, 170, 175, 270, 310, 325, 785, 1450, 3920) 
fpc <- c(8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 6)
answer <- c("2", "2", "3", "1", "2", NA, NA, 2, "3", NA, "1", NA)
df <- data.frame(id=id, strata=strata, weight=weight, fpc=fpc, answer=answer)


# this is probably a mistake
df <- df[complete.cases(df), ]
# in most data sets, you should be using na.rm=TRUE later
# and not tossing out statements before the `svydesign` gets run

df$ones <- as.numeric( df$answer %in% 1 )

df$threes <- as.numeric( df$answer %in% 3 )

dstrat<-svydesign(id=~1,strata=~strata, weights=~weight, data=df, fpc=~fpc)

a <- svymean( ~ ones + threes , dstrat , na.rm = TRUE )

svycontrast(a, list(avg=c(0,0), diff=c(1,-1)))

answered Jan 09 '16 at 19:08

Anthony Damico

5,779
7
46
77

1

Thank you for an excellent response. Your method for calculating the SE for my "net-value" with the svycontrast function works perfectly. For your comment on my NA-removal you are totaly right. I first tried to put my na.rm = TRUE in the svydesign function which did not work out. As you pointed out my na.rm = TRUE should be in the svymean function. For my question about taking weights of non-respons in to account when calculating SE I think it should be possible. I do not know how and maybe I am gonna take your advice and post this specific question on stats.stackexchange. – CaptHaddock Jan 11 '16 at 10:11
Is there a reason why it is a mistake to remove them prior to svydesign? Do you have a citation or explanation for this? What is the difference? – Kevin Carriere Feb 27 '19 at 16:24
@KevinCarriere https://gist.github.com/ajdamico/9b3232a1d986b3460baaa90f5fed3402 – Anthony Damico Feb 28 '19 at 17:12

R survey weights standard-error

Code to generate data:

1 Answers1