1

I would like to convert an R Markdown notebook that contains both R and python chunks to an R script for execution on a backend server. We use a python pipeline to prepare the data. R code continues the analysis. The R markdown notebook comes from someone else and might be updated in the future. It would be nice if we can convert the notebook automatically to an R script. We don't necessarily need the notebook output, we are more interested in the data processing done in R chunks. And an R script is a little bit easier to use for debugging.

Input notebook analysis.Rmd

---
title: "The Ultimate Question"
---

```{r setup}
library(reticulate)
```
    
```{python}
import pandas
df = pandas.DataFrame({'x':[2,3,7], 'y':['life','universe','everything']})
```
    
```{r}
str(py$df)
prod(py$df$x)
```

I tried converting it to .R with

knitr::purl("analysis.Rmd")

But the resulting analysis.R file simply comments out the python lines

## ----setup--------------------------------------------------------------------
library(reticulate)
    
## import pandas
## df = pandas.DataFrame({'x':[2,3,7], 'y':['life','universe','everything']})
    
## -----------------------------------------------------------------------------
str(py$df)
prod(py$df$x)

Expected result

## ----setup--------------------------------------------------------------------
library(reticulate)
    
py_run_string("import pandas")
py_run_string("df = pandas.DataFrame({'x':[2,3,7], 'y':['life','universe','everything']})")
    
## -----------------------------------------------------------------------------
str(py$df)
prod(py$df$x)
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
  • I copied this question in [knitr issue 2193](https://github.com/yihui/knitr/issues/2193). The author Yihui Xie thinks it is a "reasonable feature request", but a few considerations need to be discussed. – Paul Rougieux Nov 14 '22 at 11:13

1 Answers1

1

My answer is adapted from this one. The idea is to overwrite process_tangle.block(), which is used by knitr to extract the content of code chunks. I remove the if condition at the beginning of the original answer, and I add one to wrap the line in py_run_string() if the code chunk is in Python.

It's probably possible to make this more robust, but it works. (Note that you need to restart R every time you need to run assignInNamespace().)

library(knitr)

# New processing functions
process_tangle <- function (x) { 
  UseMethod("process_tangle", x)
}

process_tangle.block <- function (x) {
  params = opts_chunk$merge(x$params)
  
  if (isFALSE(params$purl)) 
    return("")
  label = params$label
  ev = params$eval
  code = if (!isFALSE(ev) && !is.null(params$child)) {
    cmds = lapply(sc_split(params$child), knit_child)
    one_string(unlist(cmds))
  }
  else knit_code$get(label)
  if (!isFALSE(ev) && length(code) && any(grepl("read_chunk\\(.+\\)", 
                                                code))) {
    eval(parse_only(unlist(stringr::str_extract_all(code, 
                                                    "read_chunk\\(([^)]+)\\)"))))
  }
  code = knitr:::parse_chunk(code)
  if (isFALSE(ev)) 
    code = knitr:::comment_out(code, params$comment, newline = FALSE)
  
  # New lines <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  if (params$engine == "python") {
    code <- paste0("py_run_string(\"", code, "\")")
  }
  # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

  # Output only the code, no documentation
  return(knitr:::one_string(code))
}



# Reassign functions
assignInNamespace("process_tangle.block",
                  process_tangle.block,
                  ns="knitr")

# Purl
purl("analysis.Rmd", output="analysis.R")

Output:

library(reticulate)

py_run_string("import pandas")
py_run_string("df = pandas.DataFrame({'x':[2,3,7], 'y':['life','universe','everything']})")

str(py$df)
prod(py$df$x)
bretauv
  • 7,756
  • 2
  • 20
  • 57