5

Is it possible to evaluate a codeblock consisting of multiple lines of code with microbenchmark? If so, how?

Example: We have some numeric data in character columns:

testdata <- tibble::tibble(col1 = runif(1000), col2 = as.character(runif(1000)), col3 = as.character(runif(1000)))

Now we can try different ways of converting these. We can directly call as.numeric on the columns:

testdata$col2 <- as.numeric(testdata$col2)
testdata$col3 <- as.numeric(testdata$col3)

We could try doing it inside a dplyr mutate:

testdata <- dplyr::mutate(testdata, col2 = as.numeric(col2),
               col3 = as.numeric(col3))

Or maybe we know all columns should be numeric so we can try something less explicit that does some checking:

testdata <- dplyr::mutate_if(testdata, .predicate = is.character, .funs = as.numeric)

Now we want to compare the performance of these 3 options.

The latter 2 options are individual calls so these can easily be tested in microbenchmark, but the first option consists of two separate calls. We could wrap the two calls in a function and then evaluate that in microbenchmark, but this introduces the slight overhead of the function, so isn't technically evaluating the solution that we have now. We can include the calls separately in the microbenchmark and then add them up after, for the mean should do fine, but for things like the min or the max this doesn't necessarily give sensible results.

The examples in the docs for microbenchmark mostly use simple individual expressions and often use a simple function to wrap code.

Is it possible to directly input multiple lines of code into microbenchmark to be evaluated together?

Marijn Stevering
  • 1,204
  • 10
  • 24

1 Answers1

8

By wrapping multiple lines of code in {} and separating them with a ; they can be evaluated as one block in microbenchmark

bench <- microbenchmark(separate = {as.numeric(testdata$col2); as.numeric(testdata$col3)},
                    mutate = dplyr::mutate(testdata, col2 = as.numeric(col2),
                                           col3 = as.numeric(col3)),
                    mutateif = dplyr::mutate_if(testdata, .predicate = is.character, .funs = as.numeric))

Which gives the following results:

> bench
Unit: microseconds
     expr      min       lq      mean    median        uq        max neval
 separate  477.014  529.708  594.8982  576.4275  611.6275   1109.762   100
   mutate 3410.351 3633.070 4465.0583 3876.6975 4446.0845  34298.910   100
 mutateif 5118.725 5365.126 7241.5727 5637.5520 6290.7795 118874.982   100
Marijn Stevering
  • 1,204
  • 10
  • 24
  • The `{}` was the first thing I tried, but in my actual use case it didn't work, I then spent some time searching for examples of multiline codeblocks in microbenchmark and couldn't find any. When creating my example to post the question I wanted to include this solution as an example of what I tried and it ended up working. I figured since I hadn't found any existing examples online it would still be worth posting. – Marijn Stevering Dec 27 '17 at 13:34