2

How can I programatically parse the names of functions, arguments, and their return values?

I am interested in generating workplan dataframes for automating R data analysis workflows with the drake package. One can generate such workplan dataframes with the workplan function.

I have an R script with functions that I would like to use. For example:

funA <- function(x){
  y <- x + 2
  y
}

funB <- function(y){
  z <- y^2
  z
}

And I would like to programmatically generate a dataframe like the one below. How can I parse function names, arguments, and return values, and create a data.frame like this, either with drake::workplan, or with other function?

  target command
1     y  funA(5)    
2     z  funB(3)      

One would do this by hand like this:

my_plan <- drake::workplan(z=funB(5), y=funA(3))

And then run the workflow with:

drake::make(my_plan)

Thank you.

ropolo
  • 117
  • 1
  • 1
  • 8
  • 1
    You want to parse a raw script text file without executing it? Or are are these objects you are loading into an active R session? – MrFlick Nov 08 '17 at 20:27
  • @MrFlick: I want to parse objects that I am loading into an active R session, by doing something like `source("functions.R")`. – ropolo Nov 08 '17 at 20:30
  • 1
    you might look into `parse("functions.R")` if you really want to do some source code analysis of a specific file. I still think what you are trying to do is a bit wonky, but I don't know enough about drake to give you any real hints :/ – Stefan F Nov 08 '17 at 20:39

2 Answers2

3

You can get the arguments with formals()

funA <- function(x, b = "default"){
  y <- x + 2
  y
}    
formals(funA)

You can also extract the body and environment of a function with body() and environment()

There is no way to get the name of a function. A function doesn't really have a name, a name refers to a function (how would you even know what to refer to if you didn't know the name?).

There is also no way to get the return value. In your example, you could get the z and y by parsing the body() of the function manually, but this is a really bad idea and would only work if you write your function source in a specific way. Even if you did this, it makes no sense. z and y get destroyed when the function returns.

Maybe you could elaborate why exactly you need the return value and function namen, I am sure there is another way around what you are trying to achieve.

Stefan F
  • 2,573
  • 1
  • 17
  • 19
  • 2
    You can use `lsf.str()` to list all variables that point to functions. – MrFlick Nov 08 '17 at 20:32
  • Thanks for the `lsf.str()` tip, @MrFlick. – ropolo Nov 08 '17 at 22:03
  • @Stefan F: Why is parsing the body of the function manually a bad idea? – ropolo Nov 09 '17 at 20:38
  • 1
    I meant your strategy to get the name of the value of the function from the name of the returned variable inside the function sounds like a really bad idea. I tried to explain it in my answer, I don't know how to state it better. Also note that the source you get via `body()` can differ from the original source when the function was written (formatting mainly). I think your best bet is getting acquainted with `parse()` as hrbrmstr suggested. – Stefan F Nov 10 '17 at 04:35
  • Thank you for your answer, @StefanF – ropolo Nov 10 '17 at 16:49
3

Assuming you have a source file like this:

funA <- function(x, y){
  y <- x + 2
  y

}

funB <- function(y){
  z <- y^2
  z

}

named test.r, you can do something like this:

library(purrr)
library(dplyr)

fenv <- new.env()
parse("test.r") %>% 
  keep(is.language)  %>% 
  keep(~grepl(", function", toString(.x))) %>% 
  map(eval, envir=fenv) %>% 
  map_df(~{
    params <- list(names(formals(.x)))
    bdy <- deparse(body(.x))
    bdy <- bdy[length(bdy)-1]
    data_frame(target = trimws(bdy), params = params)
  }) %>% 
  mutate(fname = ls(fenv))

which produces:

## # A tibble: 2 x 3
##   target    params fname
##    <chr>    <list> <chr>
## 1      y <chr [2]>  funA
## 2      z <chr [1]>  funB

That's fragile but not too fragile since it's filtering out language objects and functions before the eval and temporary environment assignment.

I'm making an assumption you can extract the parameter names from the params column to then ultimately generate what you need.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205