0

I need to weight the observations in a sample based on the marginal distributions of four demographic characteristics from a broader population. I'm currently using the package anesrake to do so.

The population info is stored in targets. This is a list containing 4 elements - one numeric vector for each respondent attribute I want to weight my sample based on. The row names of each element represent the different categories. I create targets here:

quota_age    <- c(0.30, 0.33, 0.37)
quota_race   <- c(0.62, 0.12, 0.17, 0.5, 0.3)
quota_gender <- c(0.52, 0.48)
quota_ed     <- c(0.41, 0.29, 0.19, 0.11)

names(quota_age)    <- c("18 to 34", "35 to 54", "55+")
names(quota_race)   <- c("White non-Hispanic", "Black non-Hispanic", "Hispanic", "Asian", "Other")
names(quota_gender) <- c("Female", "Male")
names(quota_ed)     <- c("HS or less", "Some college", "Bachelors", "Advanced")

targets <- list(quota_age, quota_race, quota_gender, quota_ed)

The survey file (m1b) is a data frame containing demographic info and a unique ID for each respondent (link to google sheet here). Here are the first few obs:

> head(m1b)
         ResponseId     quota_ed quota_age quota_gender         quota_race
1 R_3McITJbfcFuwc9x Some college  18 to 34       Female White non-Hispanic
2 R_2q3oeAbZgCZ5YcZ    Bachelors       55+       Female White non-Hispanic
3 R_YSVccSQ1xJ6zuDv     Advanced  35 to 54       Female White non-Hispanic
4 R_DubbKu7uJicbpQd Some college  35 to 54         Male White non-Hispanic
5 R_5zj5CNu598lCwRX    Bachelors       55+         Male              Other
6 R_21mPGFS7kHX2ELm     Advanced       55+       Female White non-Hispanic

Using the anesrake package, I want to construct a new variable called weight that I can use to account for differences between the population and sample marginal distributions in later analyses.

But when I call the anesrake function like so (the pctlim argument is extremely small to exaggerate my point):

library(anesrake)

raking <- anesrake(inputter     = targets,
                   dataframe    = m1b,
                   caseid       = m1b$ResponseId,
                   choosemethod = "total",
                   type         = "pctlim",
                   pctlim       = 0.0000001)

I get the following error:

    Error in selecthighestpcts(discrep1, inputter, pctlim) : 
      No variables are off by more than 0.00001 percent using the method you have chosen, either weighting is 
unnecessary or a smaller pre-raking limit should be chosen.

Even though this is objectively not true. Consider the quota_ed target for example:

> targets[[4]]
  HS or less Some college    Bachelors     Advanced 
        0.41         0.29         0.19         0.11 
> wpct(m1b$quota_ed)
    Advanced    Bachelors   HS or less Some college 
   0.1614583    0.3645833    0.1666667    0.3072917

Any thoughts on what I'm doing wrong would be greatly appreciated. See this link to an RBloggers post for the routine I'm trying to emulate.

J.Q
  • 971
  • 1
  • 14
  • 29
  • 1
    What if you try to delete the choosemethod, type and pctlim parameters from the function and use the defaults? I‘m also regulary using anesrake and remember having this error at some point. – deschen Jan 25 '21 at 19:41
  • @deschen I revised my call so it looks like this: `raking <- anesrake(inputter = targets, dataframe = m1b, caseid = m1b$ResponseId)`. The error persists, except it says "No variables are off by more than 5 percent" the default value for the `pctlim` argument is `0.05`. – J.Q Jan 25 '21 at 19:46
  • 1
    I checked my own weighting script, so here are a few more suggestions: 1. try converting your weighting variables in your data to type factor (looks like they are character variables). Make sure to drop emtpy levels from your data. 2. You currently only have names for the specific elements within your weighting variable, e.g. "18 to 34". But you also name your "outer" elements of your list, i.e. `names(targets) <- c("quota_age" ...)` – deschen Jan 25 '21 at 20:24
  • @deschen - I'll give both of these a shot! To make sure I understand #2, you think I should name the elements of the list `targets` so they're the same name as the variables I'm mapping them to when I call `anesrake()` right? – J.Q Jan 26 '21 at 01:16
  • @deschen - this worked! If you'd like to reply with your answer (convert the four `m1b` demo vars to factors with ordinal levels, and rename the objects within the list `targets`) I'll gladly accept the answer. – J.Q Jan 26 '21 at 01:49

1 Answers1

1

For the anesrake function to work, the following steps might be necessary:

  1. Convert your weighting variables to factors. Make sure that they don't contain empty levels.
  2. Exclude empty levels also from your targets. E.g. let's assume nobody of age 55+ would be in your data. Then you should drop that level from a) the quota_age variable as well as b) from your m1b data.
  3. The first level of your list also need to be named with the specific column names taht are supposed to be weighted, i.e. after your commands add: names(targets) <- c("quota_age", "quota_race", "quota_gender", "quota_ed").
deschen
  • 10,012
  • 3
  • 27
  • 50