0

I am working with survey data with 250 columns. A sample of my data looks like this:

q1 <- factor(c("yes",NA,"no","yes",NA,"yes","no","yes"))
q2 <- factor(c("Albania","USA","Albania","Albania","UK",NA,"UK","Albania"))
q3 <- factor(c(0,1,NA,0,1,1,NA,0))
q4 <- factor(c(0,NA,NA,NA,1,NA,0,0))
q5 <- factor(c("Dont know","Prefer not to answer","Agree","Disagree",NA,"Agree","Agree",NA))
q6 <- factor(c(1,NA,3,5,800,NA,900,2))
sector <- factor(c("Energy","Water","Energy","Other","Other","Water","Transportation","Energy"))
weights <- factor(c(0.13,0.25,0.13,0.22,0.22,0.25,0.4,0.13)

data <- data.frame(q1,q2,q3,q4,q5,q6,sector,weights)

With the help from stackoverflow I have created following function to loop through columns and create bar charts where x axis shows percentage of responses, y axis shows underlying column and fill is the sectors.

plot_fun <- function(variable) {
  total <- sum(!is.na(data[[variable]]))
  
  data <- data |> 
    filter(!is.na(.data[[variable]])) |> 
    group_by(across(all_of(c("sector", variable)))) |> 
    summarise(n = n(), .groups = "drop_last") |> 
    mutate(pct = n / sum(n)) |> 
    ungroup()
  
  ggplot(
    data = data,
    mapping = aes(fill = sector, x = pct, y = .data[[variable]])
  ) +
    geom_col(position = "dodge") +
    labs(
      y = variable, x = "Percentage of responses", fill = "Sector legend",
      caption = paste("Total =", total)
    ) +
    geom_text(
      aes(
        label = scales::percent(pct, accuracy = 0.1)
      ),
      position = position_dodge(.9), vjust = 0.5
    ) +
    scale_x_continuous(labels=function(x) paste0(x*100))+
    scale_fill_brewer(palette = "Accent")+
    theme_bw() +
    theme(panel.grid.major.y = element_blank()) 
}

Now I want to apply survey weights so that bar charts will show weighted response percentages. I have tried to add weight = data$weights to mapping() but it didn't work. I have also tried to apply weights in the calculation of percentages by doing summarise(n= sum(weights)) but it didn't work neither.

Is there a way to modify my code so that weights are applied? Thank you beforehand.

dryl
  • 7
  • 3
  • Mapping is for aesthetic mappings - mapping a column of data to something on the plot like color, x position, and so forth. `weight` isn't a plot aesthetic, so adding `weight =` inside `aes()` won't do anything. I'd suggest modifying the `dplyr` bit of your code. Perhaps include `weights` in your `summarise()` and then instead of `pct = n / sum(n)` use the weights there however you'd like. – Gregor Thomas Feb 02 '23 at 21:26
  • Note that when you `summarise` the only variables kept are the grouping variables and the variables created in `summarise`, so with your current code the `weights` column is dropped in the `summarise` step. Maybe add `sum_wts = sum(weights)` in your summarise. But I'm not sure exactly what calculation you want... – Gregor Thomas Feb 02 '23 at 21:28
  • Right now my plots show percentage of responses, in other words I calculate frequency of each column category and divide it by total frequency, by sector. Now I want to show weighted percentages of responses. I have tried your suggestion but couldn't make it work. can you please elaborate it a little bit? Thank you for your response tho – dryl Feb 02 '23 at 22:17
  • 1
    You have two different weights for Water and Other. You are grouping sector and the selected variable. If we use q3 as the variable, for instance, you are left with 4 records for Energy 0, Other 0, Other 1, and Water 1. How do you know which of the two weights for Water to apply to Water 1? – stomper Feb 03 '23 at 02:47
  • I've adjusted weights, thanks for mentioning it @stomper – dryl Feb 05 '23 at 00:25

1 Answers1

0

It's still not clear how you are looking to apply the weights. I've assumed here you want to multiply the percentage by the weight. Note you need to fix your data. Weight should not be factor if you want to use it as a numerical value for calculation. Anyhow, used weights in the group_by so that they carry through, and then in mutate to create a weighted percentage.

    total <- sum(!is.na(data[[variable]]))
    
    data <- data |> 
        filter(!is.na(.data[[variable]])) |> 
        group_by(across(all_of(c("sector", "weights", variable)))) |> 
        summarise(n = n(), .groups = "drop_last") |> 
        mutate(pct = n / sum(n), wpct  = pct*weights) |> 
        ungroup()
    
    ggplot(
        data = data,
        mapping = aes(fill = sector, x = wpct, y = .data[[variable]])
    ) +
        geom_col(position = "dodge") +
        labs(
            y = variable, x = "Percentage of responses", fill = "Sector legend",
            caption = paste("Total =", total)
        ) +
        geom_text(
            aes(
                label = scales::percent(wpct, accuracy = 0.1)
            ),
            position = position_dodge(.9), vjust = 0.5
        ) +
        scale_x_continuous(labels=function(x) paste0(x*100))+
        scale_fill_brewer(palette = "Accent")+
        theme_bw() +
        theme(panel.grid.major.y = element_blank()) 
}

If this doesn't do the trick, do clarify how you look to use the weights and what the final outcome values should be.

stomper
  • 1,252
  • 1
  • 7
  • 12