2

I've seen a few articles on how to use if statements or conditionals using piping, but I'm not sure how to apply it to my situation. Along with a specific answer to my problem, I was also hoping for also a more general explanation about adding a if statement with piping so I am able to handle most situations.

I tried to learn to use this answer below (use if() to use select() within a dplyr pipe chain), but I don't understand why we are supplying "." as an argument on the third line below and when I should do so

  mtcars %>% 
  group_by(cyl) %>% 
  { if (cond) filter(., am == 1) else . } %>% 
  summarise(m = mean(wt))

Here's a sample of my data:

df_parse<-

    structure(list(value = c("HURESPLI\t2\tLINE NUMBER OF THE RESPONDENT\tCURRENT\t22 - 23", 
                             "FILLER\t2\t\t27 - 28", "HUBUSL1\t2\tENTER LINE NUMBER\t81 - 82", 
                             "GEDIV\t1\tDIVISION\t91 - 91", "GESTFIPS\t2\tFEDERAL INFORMATION\t93 - 94"
    ), starts_with_position = c(TRUE, TRUE, TRUE, TRUE, TRUE), missing_vars = c("HUFINAL\t FINAL OUTCOME CODE\t 24 - 26", 
                                                                                "HETENURE\t ARE YOUR LIVING QUARTERS... (READ ANSWER CATEGORIES)\t 29 - 30", 
                                                                                "FOR HUBUS = 1 VALID ENTRIES  83 - 84", "  92 - 92", "  95 - 95"
    )), row.names = c(NA, 5L), class = "data.frame")

I'm trying to separate out the missing_vars column using extract (tidyr) and gsub as shown below:

df_parse<-
df_parse %>%
  mutate(dup_value2 = missing_vars) %>% 
  extract(col = dup_value2, into = "position2", regex = "(\\d+\\s*-\\s*\\d+)$") %>%
  mutate(id2 = gsub(pattern = "\\t.*", replacement = "", x = missing_vars)) %>% 
  mutate(desc2 = gsub(".*\\\t\\d+\\\t", replacement = "", x = missing_vars)) %>%  
  mutate(desc2 = gsub("(\\d+\\s*-\\s*\\d+)$", replacement = "", x = missing_vars))

This works fine, but I wanted to add a conditional on the start of this pipe, where df_parse$starts_with_position == TRUE

Something like this? (I know it doesn't work)

df_parse %>% if(starts_with_position==TRUE){
  mutate(dup_value2 = missing_vars) %>% 
    extract(col = dup_value2, into = "position2", regex = "(\\d+\\s*-\\s*\\d+)$") %>%
    mutate(id2 = gsub(pattern = "\\t.*", replacement = "", x = missing_vars)) %>%
    mutate(desc2 = gsub(".*\\\t\\d+\\\t", replacement = "", x = missing_vars)) %>%  
    mutate(desc2 = gsub("(\\d+\\s*-\\s*\\d+)$", replacement = "", x = missing_vars))
}else ""
  
Mark
  • 7,785
  • 2
  • 14
  • 34
richardgasquet
  • 109
  • 1
  • 6
  • 2
    It appears that you want conditional behaviour for each row based on the value of one of the columns. You can't use `if` here, which is a single logical test: you need a logical test for each row. In this case you can use `if_else` inside your mutate statements. Look at `?if_else` for help on using this. – Allan Cameron Oct 15 '21 at 15:20
  • how about if I want to conditionally use the extract function instead of the mutate function? – richardgasquet Oct 15 '21 at 15:30
  • What would you want to be in the `position2` column in the rows where `starts_with_position` is `FALSE`? Remember, you can always `mutate` the `position2` column afterwards using an `if_else` based on `starts_with_position` – Allan Cameron Oct 15 '21 at 15:37
  • 1
    Your sample data has too little variability to drive your point home: there are no `FALSE` values. It would help if you provide variable data and the expected output given that input data. With that, it might be that your request for an `if` conditional-processing can be replaced by something more efficient and/or canonical. – r2evans Oct 15 '21 at 15:40

0 Answers0