2

I am trying to build a function for calculating percentages for certain variables - but I am struggling to refer to an argument as a character value inside quotations as I need to use it inside a filter verb. I have the dataset below.

e1_done <- structure(list(koen_new = c("Kvinde", "Kvinde", "Mand", "Kvinde", 
                                "Mand", "Mand", "Kvinde", "Kvinde", "Mand", "Mand", "Kvinde", 
                                "Kvinde", "Kvinde", "Mand", "Mand", "Mand", "Kvinde", "Kvinde", 
                                "Mand", "Kvinde", "Mand", "Mand", "Kvinde", "Kvinde", "Mand", 
                                "Mand", "Kvinde", "Mand", "Kvinde", "Kvinde", "Mand", "Kvinde", 
                                "Kvinde", "Mand", "Mand", "Kvinde", "Kvinde", "Mand", "Mand", 
                                "Mand", "Mand", "Mand", "Mand", "Mand", "Mand", "Kvinde", "Mand", 
                                "Kvinde", "Kvinde", "Kvinde"), 
frvlg_1 = structure(c(0, 0, 0, 
                                                                                     0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                                                                                     0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 
                                                                                     0, 0, 0, 0, 0))), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame"))

    # A tibble: 50 × 2
       koen_new frvlg_1
       <chr>      <dbl>
     1 Kvinde         0
     2 Kvinde         0
     3 Mand           0
     4 Kvinde         0
     5 Mand           0
     6 Mand           0
     7 Kvinde         1
     8 Kvinde         0
     9 Mand           0
    10 Mand           0
    # … with 40 more rows

I have built the following function:

per.gender <- function(x) {
  e1_done %>% 
    group_by(koen_new) %>% 
    mutate(total_n_gender = n()) %>% 
    group_by(koen_new,{{x}}) %>% 
    mutate(n_frvl = n()) %>% 
    dplyr::select(n_frvl, total_n_gender) %>% 
    mutate(procentandel = n_frvl/total_n_gender) %>% 
    distinct(koen_new, {{x}}, procentandel,.keep_all = TRUE) %>% 
    filter({{x}} == 1) %>% 
    ungroup() %>% 
    select(koen_new, procentandel) 
}

Which produces what I want:

per.gender(frvlg_1) 

# A tibble: 2 × 2
  koen_new procentandel
  <chr>           <dbl>
1 Kvinde         0.0417
2 Mand           0.115 

However, I also wish to rename the column procentandel to a specific value for each variable that the function is carried out for, namely I want to look up the variable in a codebook which is stored in another tibble, which is shown below:

codebook <- structure(list(Label = c("Frvlg: Kultur (Fx Museer, Lokalhistoriske Arkiver, Sangkor, Teater)", 
"Frvlg: Idræt (Fx Sportsklubber, Danseforeninger, Svømmehaller)", 
"Frvlg: Fritid i Øvrigt (Fx Hobbyforeninger, Slægtsforskning, Spejder)"
), Variable = c("frvlg_1", "frvlg_2", "frvlg_3")), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame"))


# A tibble: 3 × 2
  Label                                                                 Variable
  <chr>                                                                 <chr>   
1 Frvlg: Kultur (Fx Museer, Lokalhistoriske Arkiver, Sangkor, Teater)   frvlg_1 
2 Frvlg: Idræt (Fx Sportsklubber, Danseforeninger, Svømmehaller)        frvlg_2 
3 Frvlg: Fritid i Øvrigt (Fx Hobbyforeninger, Slægtsforskning, Spejder) frvlg_3 

I can look up this value with this, which is the character value I want to rename the column procentandel to:

codebook_e1 %>% filter(Variable == "frvlg_1") %>% select(Label) %>% pull()
[1] "Frvlg: Kultur (Fx Museer, Lokalhistoriske Arkiver, Sangkor, Teater)"

However, I don't know how to refer to x as a character value in the filter verb inside a function in order to refer to the codebook. I have tried various eval functions and such - however, it doesn't seem to work for me in any way.

It works if I add a second argument which is x in quotations marks - however I want only one argument in the function.

I hope this question is clear enough!

zephryl
  • 14,633
  • 3
  • 11
  • 30
T. C. Nobel
  • 465
  • 2
  • 9

1 Answers1

1

Use rlang::ensym() to capture x as a symbol, which you can then convert using as.character():

library(tidyverse)

per.gender <- function(x) {
  new_name <- codebook_e1 %>% 
    filter(Variable == as.character(ensym(x))) %>% 
    select(Label) %>% 
    pull()

  e1_done %>% 
    group_by(koen_new) %>% 
    mutate(total_n_gender = n()) %>% 
    group_by(koen_new,{{x}}) %>% 
    mutate(n_frvl = n()) %>% 
    select(n_frvl, total_n_gender) %>% 
    mutate(procentandel = n_frvl/total_n_gender) %>% 
    distinct(koen_new, {{x}}, procentandel,.keep_all = TRUE) %>% 
    filter({{x}} == 1) %>% 
    ungroup() %>% 
    select(koen_new, !!new_name := procentandel) 
}

per.gender(frvlg_1) 

Result:

# A tibble: 2 x 2
  koen_new `Frvlg: Kultur (Fx Museer, Lokalhistoriske Arkiver, Sangkor, Teater)`
  <chr>                                                                    <dbl>
1 Kvinde                                                                  0.0417
2 Mand                                                                    0.115 

Also note use of !! and := operators to use the value referred to by new_name in the final select() statement — otherwise the column would just be named "new_name".

zephryl
  • 14,633
  • 3
  • 11
  • 30
  • I think you're missing a `!!`, it should be: `!!as.character(ensym(x)`. But the recommended practice is `.data[[x]]`, this way there is no need to inject with `!!`, and no need to create a symbol. See https://rlang.r-lib.org/reference/dot-data.html and https://rlang.r-lib.org/reference/topic-data-mask-programming.html#names-patterns – Lionel Henry Mar 22 '22 at 17:07
  • Ah now I see the input is not a string but a bare column name, never mind regarding `.data` then. But you still need `!!` in the `filter()` statement, because currently you're comparing all values to a column name, rather than the column itself (this appears to work because of silent coercion rules in base R, but produces wrong results). – Lionel Henry Mar 22 '22 at 17:11
  • @LionelHenry "you're comparing all values to a column name" - I believe this is the behavior OP wants. In their lookup table `codebook_e1`, the `Variable` column consists of column names from the primary `e1_done` table. If for example the function is called with `x = frvlg_1`, I believe the goal is to use the values in `e1_done$frvlg_1` in some places -- e.g., `group_by(koen_new,{{x}})` -- but to use the column name `"frvlg_1"` itself in other places -- e.g., to filter `codebook_e1` to cases where `Variable == "frvlg_1"`. Does this make sense? – zephryl Mar 22 '22 at 21:22