-1

I've been trying to use the dcast() function in reshape2 to widen a large dataframe in R. However, I am not sure what to use for the aggregation function, fun.aggregate that dcast requires because I want to keep the discrete values of the value.var, whereas dcast insists on forcing length as the default, making every value dichotomous. For illustration, my data look like this:

x <- c("a", "b", "c")
y <- c("d", "e", "f")
num <- c(10, 20, 21)
data <- data.frame(cbind(x,y,num))

x y num
a d  10
b e  20
c f  21

After input m <- dcast(data, x ~ y, value.var = "num"), dcast returns the following DF:

  d  e  f
a 1  0  0
b 0  1  0
c 0  0  1

However, I want it to look like this:

  d  e  f
a 10 0  0
b 0  20 0
c 0  0  21

What am I doing wrong?

  • Sorry, that was a mistake. I meant to have the value at `a` `d` equal to 10, not 21. Thanks! – quantoid6969 Feb 24 '19 at 17:44
  • Thanks, I forgot to put the elements in quotation marks. I'll change that now. – quantoid6969 Feb 24 '19 at 17:47
  • 1
    `dcast(data, x ~ y, value.var = "num", fill = 0)` gives your desired output for me. Not sure where that first result came from, I get your desired output with `NA`s instead of 0s if I run the code you put in the question. – IceCreamToucan Feb 24 '19 at 17:48
  • @IceCreamToucan you're right, it seems to work on smaller DFs. The data I'm trying to use on is much larger (n=6000), so that must be contributing to the issue. I guess I'll try cbinding everything together if I can't come up with a better solution. – quantoid6969 Feb 24 '19 at 17:54

1 Answers1

-1

You can also switch to tidyr.

library(tidyverse)

x <- c("a", "b", "c")
y <- c("d", "e", "f")
num <- c(10, 20, 21)

df <- tibble(x, y, num)

df %>% 
  spread(y,  num, fill = 0)

output is:

# A tibble: 3 x 4
  x         d     e     f
  <chr> <dbl> <dbl> <dbl>
1 a        10     0     0
2 b         0    20     0
3 c         0     0    21
Stephan
  • 2,056
  • 1
  • 9
  • 20