0

I am making my first baby steps with non standard evaluation (NSE) in dplyr. Consider the following snippet: it takes a tibble, sorts it according to the values inside a column and replaces the n-k lower values with "Other".

See for instance:

library(dplyr)

df <- cars%>%as_tibble

k <- 3

df2 <- df %>%
arrange(desc(dist))  %>% 
mutate(dist2 = factor(c(dist[1:k],
                rep("Other", n() - k)),
                levels = c(dist[1:k], "Other")))

What I would like is a function such that:

df2bis<-df %>% sort_keep(old_column, new_column, levels_to_keep)

produces the same result, where old_column column "dist" (the column I use to sort the data set), new_column (the column I generate) is "dist2" and levels_to_keep is "k" (number of values I explicitly retain). I am getting lost in enquo, quo_name etc...

Any suggestion is appreciated.

tmfmnk
  • 38,881
  • 4
  • 47
  • 67
larry77
  • 1,309
  • 14
  • 29
  • 1
    Do you want to keep `k` highest levels or any levels corresponding to the top `k` values in the vector? For example, for vector `c(10, 10, 10, 10, 9, 8, 7, 6, 5)`, would you like to keep the levels `10`, `9` and `8` or only `10`? – Vlad C. Sep 10 '18 at 15:57
  • Have you checked out the [forcats](https://blog.rstudio.com/2016/08/31/forcats-0-1-0/) package? It's the tidyverse package that's for working with factors. – MrFlick Sep 10 '18 at 15:59
  • Hi! In c(10, 10, 10, 10, 9, 8, 7, 6, 5) I would like to keep 10, 9 and 8. In my data I have continuous numbers and repetitions do not occur, which is why I did not think about this. I really would like to translate that code into a dplyr function (to be able to reuse it). – larry77 Sep 10 '18 at 19:22

2 Answers2

1

You can do:

library(dplyr)

sort_keep=function(df,old_column, new_column, levels_to_keep){
  old_column = enquo(old_column)
  new_column = as.character(substitute(new_column))
  df %>%
    arrange(desc(!!old_column))  %>% 
    mutate(use = !!old_column,
           !!new_column := factor(c(use[1:levels_to_keep],
                                  rep("Other", n() - levels_to_keep)),
                                levels = c(use[1:levels_to_keep], "Other")),
           use=NULL)
}


 df%>%sort_keep(dist,dist2,3)
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Thanks! I was on the right track, but I got lost at some point. I will study your reply and post again if I have questions. – larry77 Sep 11 '18 at 08:07
  • If I may, can I ask you some info about the mutate statement in the function? you explicitly define use = !!old_column, and if I just replace use with !! old_column, then the function fails. I also do not get the use=NULL at the end. Many thanks! – larry77 Sep 11 '18 at 08:23
  • I did not want to continue calling `(!!old_column)` each time. So I created a column called it `use` then used it throughout. After which, I deleted the column by `use=NULL` this deletes the column. To replace use, do `(!!old_column)` Not `!!old_column` since the parenthesis ensures precedence ie `factor(c((!!old_column)[1:levels_to_keep],...,levels = c((!!old_column)[1:levels_to_keep])` will work – Onyambu Sep 11 '18 at 15:45
  • @larry77 I did not want to continue calling `(!!old_column)` each time. So I created a column called it `use` then used it throughout. After which, I deleted the column by `use=NULL` this deletes the column. To replace `use`, do `(!!old_column)` Not `!!old_column` since the parenthesis ensures precedence ie `factor(c((!!old_column)[1:levels_to_keep],...,levels = c((!!old_column)[1:levels_to_keep])` will work – Onyambu Sep 11 '18 at 16:21
  • Thanks for the explanation. Now I also understand what was the problem with my original attempt. – larry77 Sep 12 '18 at 19:02
0

Something like this?

old_column = "dist"
new_column = "dist2"
levels_to_keep = 3

command = "df2bis<-df %>% sort_keep(old_column, new_column, levels_to_keep)"
command = gsub('old_column', old_column, command)
command = gsub('new_column', new_column, command)
command = gsub('levels_to_keep', levels_to_keep, command)
eval(parse(text=command))
broken.eggshell
  • 945
  • 7
  • 13