0

I am a beginner and not too familiar with advanced features of R. I am unable to understand why reduce() doesn't work for grouped_df. I am building upon my discussion at Rowwise summation for Tibble datatype where I posted reduce() as one of the solutions when the class of datatype is:

"tbl_df"     "tbl"        "data.frame"

Here's the sample data:

  df <- data.frame(client = rep(c("Client A","Client B", "Client C"),3), 
                       year = rep(c(2014,2013,2012), each=3), 
                       rev1 = rep(c(10,20,30),3),
                       rev2 = rep(c(10,20,30),3))

where, class (df) is "tbl_df" "tbl" "data.frame"

I'd now convert df to of class grouped_df by :

df1 <- df %>% 
        group_by(client, year,rev1) %>%
        summarise(rev3 = sum(rev1,rev2)) %>%
        select(client, year, rev3, rev1)

where, class (df1) is "grouped_df" "tbl_df" "tbl" "data.frame", which is as expected.

Now, when I use reduce() to do row-wise summation on df1, it throws an error.

df1%>% dplyr::mutate(sum=Reduce("+",.[3:4]))
Error: incompatible size (9), expecting 1 (the group size) or 1

However, when I convert df1 to data frame, it works well.

df1%>% dplyr::as_data_frame() %>%  dplyr::mutate(sum=Reduce("+",.[3:4]))

The head() of above output is:

# A tibble: 6 × 5
    client  year  rev3  rev1   sum
    <fctr> <dbl> <dbl> <dbl> <dbl>
1 Client A  2012    20    10    30
2 Client A  2013    20    10    30
3 Client A  2014    20    10    30
4 Client B  2012    40    20    60
5 Client B  2013    40    20    60
6 Client B  2014    40    20    60
...

Can someone please explain why reduce() function doesn't work for grouped data, but works for non-grouped data? Maybe, I am missing something here.

Community
  • 1
  • 1
watchtower
  • 4,140
  • 14
  • 50
  • 92

2 Answers2

1

You're not using the replace() function in any of your code blocks above. You're using the Reduce() function.

As an aside, df() is a density distribution function in the stats package - it's bad practice to assign objects to functions.

conrad-mac
  • 843
  • 10
  • 15
  • 2
    This is not an answer. Please avoid posting comments as answers. Once you get enough reputation you will be able to comment anywhere. – Sotos Jan 07 '17 at 08:11
0

Reduce() and replace() work on vectors.

The df1 grouped dataframe becomes much more than a collection of vectors. Below is what it looks like if you flip open the objects (found in the environment pane.) df and df1 under the hood

If we add an ungroup() we can get a collection of vectors back.

df2 <- df %>% 
    group_by(client, year,rev1) %>%
    summarise(rev3 = sum(rev1,rev2)) %>%
    select(client, year, rev3, rev1) %>% 
    ungroup %>% 
    mutate(sum=Reduce("+",.[3:4]))

In any case, could maybe this dplyr code work instead?

mutate(df, rev3 = rev1 + rev2, sum = 2*rev1 + rev2)
leerssej
  • 14,260
  • 6
  • 48
  • 57
  • Thanks for your help. I believe using `as_data_frame()` and `ungroup()` will have the same effect. Do you mind explaining what would happen if we `ungroup()` the data? Why is it that `reduce()` doesn't work on grouped data? I'm still unclear about the "why". – watchtower Jan 07 '17 at 09:23
  • watch your enviroment as you do it to you dataframe. Nothing much happens expect that if flips back to a simple dataframe (collection of vectors again.) in that you drop all the attributes that the `grouped_df()` builds. See here:https://github.com/hadley/dplyr/blob/master/R/grouped-df.r Reduce wants vectors and when you feed it an `attr()` it gets a mouth full of fur and coughs it back at you: figuratively. :-D [I am curious to hear what others have to offer, too.] – leerssej Jan 07 '17 at 10:16
  • btw: `class(attr)` = "function" but now I suspect it has more to do with the same type of issue as `fill` had. https://github.com/tidyverse/tidyr/commit/849aab524eb4e2e4ac8a32ddc4930dc78917c824 per http://stackoverflow.com/questions/34517370/group-by-into-fill-not-working-as-expected – leerssej Jan 07 '17 at 10:25
  • To be honest, I have never seen anyone yet be able to explain why `group_by` causes the trip ups it does. See http://stackoverflow.com/a/21656344/5088194 (and note the ten upvotes on the first comment :-D) I just know that when you start getting warnings after a `group_by` complaining about inappropriate lengths it means you need to either `ungroup` your chain or in earlier editions I recall throwing in a `rowwise()` farther up the chain. I've never had time to disassemble the apparatus before, but it does appear to me that functions don't like to be fed functions. – leerssej Jan 07 '17 at 10:39
  • @ hightower - It looks similar to this behavior: [lag() & lead() Issue](http://stackoverflow.com/questions/28235074/dplyr-lead-and-lag-wrong-when-used-with-group-by) and Romain Francois sorted it out here: [bug fix #925](https://github.com/hadley/dplyr/commit/98e2efac07d87722c0acf1368f223423dc7edcd5). Maybe you could raise the issue to Hadley and Romain as a bug? I think that will get you the most complete answer. – leerssej Jan 08 '17 at 02:54