4

I have a data frame like this

   df <- data.frame(tiny = rep(letters[1:3], 20), 
                  block = rnorm(60), tray = runif(60, min=0.4, max=2),
                  indent = sample(0.5:2.0, 60, replace = TRUE))

I nested this data frame

nm <- df%>%
       group_by(tiny)%>%
       nest()

then wrote these functions

library(dplyr)
library(purrr)
library(tidyr)

model <- function(dfr, x, y){
             lm(y~x, data = dfr)
         }

model1 <- function(dfr){
           lm(block~tray, data = dfr)
          }

I want to run this model for all tiny classes, so I did

 nm%>%
   mutate(
     mod = data %>% map(model1)
   )

the above code works fine but if I want to supply the variables as arguments like I have in the model1 function, I get errors. This is what I do

 nm%>%
    mutate(mod = data %>% map(model(x=tray, y=block)))

I keep getting the error Error in mode(x = tray, y = block) : unused argument (y = block).

Also I tried plotting these using ggplot2

plot <- function(dfr, i){
    dfr %>%
    ggplot(., aes(x=tray, y=block))+
geom_point()+
xlab("Soil Properties")+ylab("Slope Coefficient")+
ggtitle(nm$tiny[i])

nm%>%
 mutate(put = data %>% map(plot))

the idea is that I want ggplot to put titles a, b, and c for each of the plots that will be produced. Any help would be greatly appreciated. Thanks

Kay
  • 2,057
  • 3
  • 20
  • 29

2 Answers2

9

use base function split to split data into list of groups.

library( purrr )
library( ggplot2 )
df %>% 
  split( .$tiny) %>%
  map(~ lm( block ~ tray, data = .))

df %>% 
  split( .$tiny) %>%
  map(~ ggplot( data = ., aes( x = tray, y = block ) ) +
        geom_point( ) +
        xlab("Soil Properties") + 
        ylab("Slope Coefficient") +
        ggtitle( as.character( unique(.$tiny) ) ) )

Using Functions:

lm_model <- function( data ) 
{
  return( lm( block ~ tray, data = data ) )
}

plot_fun <- function( data )
{
  p <- ggplot( data = data, aes( x = tray, y = block ) ) +
    geom_point( ) +
    xlab("Soil Properties") + 
    ylab("Slope Coefficient") +
    ggtitle( as.character( unique(data$tiny) ) )

  return( p )
}

df %>% 
  split( .$tiny) %>%
  map(~ lm_model( data = . ) )

df %>% 
  split( .$tiny) %>%
  map(~ plot_fun( data = . ) )

Creating formula inside function

lm_model <- function( data, x, y ) 
{
  form <- reformulate( y, x )

  return( lm( formula = form, data = data ) )
}

df %>% 
  split( .$tiny) %>%
  map(~ lm_model( data = ., x = 'tray', y = 'block' ) )

Your solution would have worked if you had your function formulated like below.

model <- function(dfr, x, y){
  lm( formula = eval(parse(text = paste('as.formula( ', y, ' ~ ', x, ')', sep = ''))),
      data = dfr)
}
Sathish
  • 12,453
  • 3
  • 41
  • 59
  • this works like the model example I gave in my question. I get to learn another way of doing it which is awesome. however, I want to be able to use a function and supply any arguments I want to the function in the using `map`. – Kay Feb 28 '17 at 21:05
  • You did answer part of it. In your `lm_model` function, can I change `block` and `tray` and supply them as arguments in the function? How do I do it? – Kay Feb 28 '17 at 21:18
  • Yes. so in this case, I would be passing the Data frame, the `x` variable and the `y` variables to the `lm_model` function and evaluate using the `map` function – Kay Feb 28 '17 at 21:22
  • Your solution works perfectly. I don't get why `df%>%group_by(tiny)%>%mutate(mod = data %>%map(lm_model(x='tray', y='block', data=.)))` doesn't work. – Kay Feb 28 '17 at 21:31
  • inside the `lm_model` function, in the second line, put `print(form)`. You will see the magic. For more info, please read `?reformulate` – Sathish Feb 28 '17 at 21:33
  • see at the end, I added modifications to your function described in the question. Hope this helps – Sathish Feb 28 '17 at 21:48
  • use `mutate` so it adds the model output as a column to the existing df and that would answer my question. Good job though. – Kay Feb 28 '17 at 21:50
  • I would love that – Kay Feb 28 '17 at 21:54
  • it is better to keep it as a list, so you can get summary and other statistics – Sathish Feb 28 '17 at 21:56
  • If you want to attach it to your data frame, in `R` that kind of operation is called parameterization of data frame. Essentially you wil be creating pframes. Instead of attaching the output, attach the functions you created, so you can call the functions on your data to get output whenever needed. – Sathish Feb 28 '17 at 21:58
  • but this out of scope of this question – Sathish Feb 28 '17 at 21:59
  • Which of these solutions are you referring to `df%>%group_by(tiny)%>%mutate(mod = data%>%map(model(x='tray', y='block', dfr=.)))` or how did you use the mutate function please – Kay Feb 28 '17 at 22:11
  • see, after applying a model, map will return a list. You just need one cell to store it, instead of entire column of a data frame. – Sathish Feb 28 '17 at 22:12
  • what do you want to do with the output? – Sathish Feb 28 '17 at 22:13
  • I want to maintain the output as a dataframe, add the model as a new column, extract the estimates and then use them in my plots – Kay Feb 28 '17 at 22:52
5

If you want to use mutate with map, you'll need to also use tidyr for nest. You'll be using tibbles to store the output (or data frames with list-columns of data frames).

I used the functions from @Sathish's detailed answer (with some modifications).

library(purrr)
library(dplyr)
library(tidyr) 

df <- data.frame(tiny = rep(letters[1:3], 20), 
                 block = rnorm(60), tray = runif(60, min=0.4, max=2),
                 indent = sample(0.5:2.0, 60, replace = TRUE))

lm_model <- function( data ) 
{
  return( lm( block ~ tray, data = data ) )
}

# Altered function to include title parameter with purrr::map2
plot_fun <- function( data, title )
{
  p <- ggplot( data = data, aes( x = tray, y = block ) ) +
    geom_point( ) +
    xlab("Soil Properties") + 
    ylab("Slope Coefficient") +
    ggtitle( as.character( title ) )

  return( p )
}


results <- df %>% 
  group_by(tiny) %>% 
  nest() %>% 
  mutate(model = map(data, lm_model),
         plot = map2(data, tiny, plot_fun))

You end up with:

> results

# A tibble: 3 × 4
    tiny              data    model     plot
  <fctr>            <list>   <list>   <list>
1      a <tibble [20 × 3]> <S3: lm> <S3: gg>
2      b <tibble [20 × 3]> <S3: lm> <S3: gg>
3      c <tibble [20 × 3]> <S3: lm> <S3: gg>

And you can access what you need using unnest or via extraction ([ and [[)

> results$model[[1]]

Call:
lm(formula = block ~ tray, data = data)

Coefficients:
(Intercept)         tray  
    -0.3461       0.3998  
Jake Kaupp
  • 7,892
  • 2
  • 26
  • 36