1

I'm looking at simple way of simplifying code.

Example

The sqrt function could be easily applied to the subset of columns below.

require(magrittr)
mtcars[,-which(colnames(mtcars) %in% 
                 c("mpg", "cyl", "drat", "wt", "carb",
                   "hp", "qsec", "vs", "am", "gear"))] %<>%
  sqrt

Problem

I'm interested in applying other transformations to the subset without the need to type the whole subsetting sequence again.

For instance the code:

mtcars[,-which(colnames(mtcars) %in% 
                 c("mpg", "cyl", "drat", "wt", "carb",
                   "hp", "qsec", "vs", "am", "gear"))] %<>%
  .data * 1000

will return error:

Error in function_list[[k]](value) : could not find function ".data"

Same with syntax using .. My question is: syntax-wise, how can i get the same effect as in the sqrt function but applying longer function to the passed subset?

Tensibai
  • 15,557
  • 1
  • 37
  • 57
Konrad
  • 17,740
  • 16
  • 106
  • 167
  • replace `.data * 1000` by `\`*\`(1000)` ? to call it as a function and not an operator ? – Tensibai Jul 12 '16 at 15:56
  • @Tensibai It's interest approach, but how can I efficiently pass a number of operations? Multiplication is only an example of potential usage. In practice I would like to a couple of things for this data. – Konrad Jul 12 '16 at 16:25
  • 1
    Write a custom function which return compatible output ? (vector for vector, etc) or just an anonymous function as @Uwe showcased in his/her answer. I vote for the 1, so you can test the function alone too. I.e: `%<>% { . * 10 - 1000 }` for example – Tensibai Jul 12 '16 at 16:27
  • @Tensibai Thanks very much, I'm using both of your suggestions, I will be happy to accept if you care to write an answer. – Konrad Jul 12 '16 at 20:57

1 Answers1

3

What about?

sel_cols <- setdiff(colnames(mtcars), 
                    c("mpg", "cyl", "drat", "wt", "carb",
                      "hp", "qsec", "vs", "am", "gear"))
mtcars[, sel_cols] %<>% {sqrt(.) %>% `*`(1000)}

Or a data.table approach?

library(data.table)
sel_cols <- setdiff(colnames(mtcars), 
                    c("mpg", "cyl", "drat", "wt", "carb",
                      "hp", "qsec", "vs", "am", "gear"))

dt <- as.data.table(mtcars)
dt[, (sel_cols) := lapply(.SD, sqrt), .SDcols = sel_cols][]

and combined with pipe:

dt <- as.data.table(mtcars)
dt[, (sel_cols) := lapply(.SD, function(x) {sqrt(x) %>% `*`(1000)}), .SDcols = sel_cols][]
Uwe
  • 41,420
  • 11
  • 90
  • 134
  • Thanks for the input, I think that first suggestion is quite similar to what Tensiabi proposed in his comment. In terms of the data table approach, it's a very interesting but slightly longish. – Konrad Jul 12 '16 at 16:30
  • The last [] is unnecessary in the data.table approaches, the orignal dt is updated. Advantage is data.table update in place without copy (on huge datasets it avoid creating a new vector to replace original one). Using lapply is needed in case there's more than 1 column returned, same thing with magrittr btw. cc @konrad – Tensibai Jul 12 '16 at 16:42
  • @Uwe you should prefer `setDT(mtcars)` instead of `as.data.table` as setDT will key the data.table at the same time, allowing cross join, etc. – Tensibai Jul 12 '16 at 16:49
  • 1
    @Tensibai I did this purposefully because `setDT(mtcars)` results in an error message `Can not convert 'mtcars' to data.table by reference because binding is locked. It is very likely that 'mtcars' resides within a package (or an environment) that is locked to prevent modifying its variable bindings. Try copying the object to your current environment, ex: var <- copy(var) and then using setDT again.` Instead of `setDT(copy(mtcars))` I used `dt <- as.data.table(mtcars)` to avoid confusion. – Uwe Jul 12 '16 at 17:11
  • @Konrad Yes, you're right. In a production code the last `[]` would be superfluous. However, I didn't want people who are less experienced in `data.table` to get confused when they run the code snippet and nothing gets printed. So, I used it as a shorthand for `print(dt)`. – Uwe Jul 12 '16 at 17:18
  • @Uwe the 'back pipe' of magrittr doesn't print either as it modifies the source (just for information) – Tensibai Jul 12 '16 at 21:21