3

I've just noticed a very weird behavior in the dummies package of R when knitted in .Rmd. Here's the reproducible example.

---
title: "Dummies Package Behavior"
author: "Kim"
date: '`r Sys.Date()`'
output:
  pdf_document:
    toc: yes
    toc_depth: '3'
---

Load the libraries

```{r}
library(tidyverse)
library(dummies)
```

Main data wrangling

```{r}
df <- data_frame(year = c(2016, 2017, 2018))
temp <- dummy(df$year)
temp <- as_data_frame(temp)
df <- bind_cols(df, temp)
```

View output

```{r}
df
```

What I'm expecting to see when I view the df are nice 0-1 columns of year2016, year2017, and year2018, which is the normal behavior for the dummies package.

When you knit this R Markdown document in RStudio, it instead brings out the following: C:/Users/Kim/Desktop/dummies.Rmd2016, C:/Users/Kim/Desktop/dummies.Rmd2017, and C:/Users/Kim/Desktop/dummies.Rmd2018. That is, it uses the whole document address to make the column names.

I don't understand why such behavior occurs. Obviously, I want to have column names as year2016, year2017, and year2018.

halfer
  • 19,824
  • 17
  • 99
  • 186
Kim
  • 4,080
  • 2
  • 30
  • 51

2 Answers2

2

The problem is not related to dplyr because we can reproduce it with data.frame(). Apparently there is a problem with assigning column labels in the dummy() function when executed as part of an R Markdown document. As noted in Luke's answer, one workaround is to use dummy.data.frame(). Another would be to use the colnames() function to rename the columns after binding the year and dummy variables with cbind(), which also enables a dplyr-based solution.

This should probably be submitted as a bug report for the dummies package.

---
title: "Behavior of dummies package"
author: "anAuthor"
date: "12/26/2017"
output:
  html_document: default
  pdf_document: default
  word_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# first, reproduce error with data.frame()

```{r}
library(dummies)
df <- data.frame(year = c(2016, 2017, 2018))
df
dummyCols <- dummy(df$year)
dummyCols <- as.data.frame(dummyCols)
dummyCols
```

# data.frame() approach to fix the error

```{r}
df <- data.frame(year = c(2016, 2017, 2018))
df
dummyCols <- dummy.data.frame(data=df,dummy.classes="ALL")
dummyCols
df <- cbind(df, dummyCols)
df
```

...and the output, first reproducing the error.

enter image description here

...second, using dummies.data.frame() to avoid the error.

enter image description here

The dplyr correction works as follows.

# dplyr approach 

```{r}
library(tidyverse)
df <- data_frame(year = c(2016, 2017, 2018))
temp <- dummy(df$year)
temp <- as_data_frame(temp)
df <- bind_cols(df, temp)
colnames(df) <- c("year",unlist(lapply(2016:2018,function(x) {
     paste("year",x,sep="")
})))
df
```

enter image description here

Len Greski
  • 10,505
  • 2
  • 22
  • 33
  • 1
    Thank you Len. I will be indeed filing a bug report for the ```dummies``` package. – Kim Dec 27 '17 at 02:10
1

I'm not sure why that interaction is happening, but this slight modification seems to get around it:

```{r}
df <- data.frame(year = c(2016, 2017, 2018))
df <- data.frame(df, dummy.data.frame(data = df, dummy.classes = "ALL"))
```

enter image description here

Note that data.frame from base rather than data_frame from dplyr seems to make a difference.

Luke C
  • 10,081
  • 1
  • 14
  • 21