0

In the spirit of purr, broom, modelr, I am trying to create a "meta" data.frame in which each row denotes the dataset (d) and the model parameters (yvar, xvars, FEvars). For instance:

iris2 <- iris %>% mutate(Sepal.Length=Sepal.Length^2)
meta <- data.frame(n=1:4,
           yvar = c('Sepal.Length','Sepal.Length','Sepal.Length','Sepal.Length'),
           xvars= I(list(c('Sepal.Width'),
                         c('Sepal.Width','Petal.Length'),
                         c('Sepal.Width'),
                         c('Sepal.Width','Petal.Length'))),
           data= I(list(iris,iris,iris2,iris2)) )

Now, I would like to run a model for each column of "meta". And then add a list column "model" with the model output object. To run the model I use an auxiliary function that uses a dataset, a y variable and a vector of x variables:

OLS_help <- function(d,y,xvars){
  paste(y, paste(xvars, collapse=" + "), sep=" ~ ") %>% as.formula %>% 
    lm(d)
}
y <- 'Sepal.Length'
xvars <- c('Sepal.Width','Petal.Length')
OLS_help(iris,y,xvars)

How can I execute OLS_help for all the rows of meta and adding the output of OLS_help as a list column in meta? I tryed the following code, but it did not work:

meta %>% mutate(model = map2(d,yvar,xvars,OLS_help) )
Error: Can't convert a `AsIs` object to function
Call `rlang::last_error()` to see a backtrace

OBS: The solution to when only the "data" (nested) list column (corvered in Hadley's book here) is:

by_country <- gapminder %>% group_by(country, continent) %>% nest()
country_model <- function(df) {  lm(lifeExp ~ year, data = df) }
by_country <- by_country %>% mutate(model = map(data, country_model)) 
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
LucasMation
  • 2,408
  • 2
  • 22
  • 45
  • Why work within the `tidyverse` but use a `data.table`? I recommend sticking to one ecosystem. Furthermore, I'm unclear about what you're trying to do. You mention a model but I don't see any code pertaining to fitting any form of model to your data. Is `meta` your input data? If so, can you provide your expected output based on your sample data. That will help us understand what you're trying to do. – Maurits Evers Nov 07 '19 at 21:13
  • Your sample code is not reproducible; it produces an error "[...] object 'iv' not found". It sounds to me you're after `map2` or `pmap`. Happy to show an example if you update your input data. – Maurits Evers Nov 07 '19 at 21:45
  • 1
    I edited the question, trying to add reproducible example as I could – LucasMation Nov 07 '19 at 23:14

1 Answers1

3

We can use pmap in the following way

df <- meta %>%
    as_tibble() %>%
    mutate_if(is.factor, as.character) %>%
    mutate(fit = pmap(
        list(yvar, xvars, data),
        function(y, x, df) lm(reformulate(x, response = y), data = df)))
## A tibble: 4 x 5
#      n yvar         xvars     data               fit
#  <int> <chr>        <I<list>> <I<list>>          <list>
#1     1 Sepal.Length <chr [1]> <df[,5] [150 × 5]> <lm>
#2     2 Sepal.Length <chr [2]> <df[,5] [150 × 5]> <lm>
#3     3 Sepal.Length <chr [1]> <df[,5] [150 × 5]> <lm>
#4     4 Sepal.Length <chr [2]> <df[,5] [150 × 5]> <lm>

Explanation: pmap iterates over multiple arguments simultaneously (similar to base R's Map); here we simultaneously loop throw entries in column yvar, xvar and data, then use reformulate to construct the formula to be used within lm. We store the lm fit object in column fit.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • nice, tks! Could you clarify the role of `mutate_if(is.factor, as.character)` in your solution? – LucasMation Nov 07 '19 at 23:49
  • 1
    @LucasMation `reformulate` requires input objects to be `character` vectors/scalars; since in your `meta` data `yvar` is a `factor` we need to convert the column to a `character` vector. That's what the `mutate_if` call does (it actually converts *all* `factor` columns to `character` columns). A canonical `tidyverse` approach uses `character` vectors whenever possible (to be precise, in the `tidyverse` implicit `character` to `factor` conversions are avoided, since many problems stem from such conversions that happen e.g. in base R's `data.frame`, `read.table` etc.). – Maurits Evers Nov 07 '19 at 23:55
  • I still get errors if I try to predefine the function before with `OLSa <- function(y, x, df){ lm(reformulate(x, response = y), data = df) }` and then run `... mutate(fit = pmap(list(yvar, xvars, data),OLSa(y,x,df)))` – LucasMation Nov 08 '19 at 00:20
  • @LucasMation I cannot reproduce; defining the function outside doesn't make a difference. The following still works: `f <- function(y, x, df) lm(reformulate(x, response = y), data = df); meta %>% as_tibble() %>% mutate_if(is.factor, as.character) %>% mutate(fit = pmap(list(yvar, xvars, data), f))`. So the error must lie elsewhere with your code and/or data. – Maurits Evers Nov 08 '19 at 00:33
  • 1
    Ah I see the issue. You're using `pmap` incorrectly. It should be `... mutate(fit = pmap(list(yvar, xvars, data), OLSa)`. – Maurits Evers Nov 08 '19 at 00:35