0

I am creating a function that zips through the data frame and spreads a factor variable to new dummy variables since some machine learning algorithms can not handle Factors. To do that, I use the spread() function inside the cleaning function.

When I try to pass a name of a column I need to spread, however, it throws an error:

Error: Invalid column specification

Here is the code:

library(tidyr)
library(dplyr)    
library(C50) # this is one source for the churn data
data(churn)


f <- function(df, name)  {
  df$dummy <- c(1:nrow(df))       # create dummy variable with unique values

  df <- spread(df, key <- as.character(substitute(name)), "dummy", fill = 0 )
}

churnTrain = f(churnTrain, name = "state")
str(churnTrain)

Of course, if I replace key = as.character(substitute(name)) with key = "state" it works just fine but the whole function loses its reusability.

How to pass column name to inner function without error?

divibisan
  • 11,659
  • 11
  • 40
  • 58
Jad Gift
  • 305
  • 4
  • 15
  • The problem is that you're using `<-` instead of `=`. They both work the same way for assigning to a variable, but here you're trying to pass something to an argument which *requires* the `=` operator. By using `<-` you're saving `as.character(...` to a object named `key`, not passing it to the `key` argument of `spread` – divibisan Mar 06 '19 at 16:28
  • Also, in the code you presented, there's no need for `substitute`. `spread` expects a object of type `character` for its `key` argument so you can just pass in the name argument directly: `spread(df, key = name, ...)` – divibisan Mar 06 '19 at 16:31

2 Answers2

0

Do you need to use tidyverse?

If not, you can try the older reshape2 package:


library(reshape2)
library(C50) # this is one source for the churn data
data(churn)

f <- function(df1, name)  {
  df1$dummy <- 1:nrow(df1)  # create dummy variable with unique values
  df1 <- dcast(df1, as.formula(paste0("dummy~", name)))
}

ct1 <- f(churnTrain, name = "state")

If you absolutely need to work in tidyverse, you can try following the tutorial at http://dplyr.tidyverse.org/articles/programming.html. Unfortunately, their examples don't work on my machine.

Igor F.
  • 2,649
  • 2
  • 31
  • 39
0
library(tidyr)
library(dplyr)    
library(C50) # this is one source for the churn data
data(churn)


f <- function(df, name)  {
  df$dummy <- c(1:nrow(df))       # create dummy variable with unique values

  df <- spread_(df, key = name, "dummy", fill = 0 )
}

churnTrain = f(churnTrain, name = "state")
str(churnTrain)
Dan
  • 11,370
  • 4
  • 43
  • 68