2

I have a Tibble, and I have noticed that a combination of dplyr::rowwise() and sum() doesn't work. I know there are many threads on this topic, and I have got 2 to 3 solutions, but I am not quite why the combination of rowwise() and sum() doesn't work.

So, my question is : why doesn't a combination of rowwise() and sum() work AND what can we do to make it work? I am a beginner so I believe that I am doing something wrong in the below code.

Data:

dput(data)
structure(list(Fiscal.Year = c(2016L, 2016L, 2016L, 2016L, 2016L, 
2016L, 2016L, 2016L, 2016L, 2016L), col1 = c(0, 26613797.764311, 
0, 12717073.587292, 0, 0, 0, 0, 0, 0), col2 = c(0, 0, 0, 0, 8969417.89721166, 
0, 11483606.8417117, 0, 0, 0), col3 = c(0, 0, 33251606.347943, 
0, 25082683.4492186, 0, 17337191.3014127, 0, 0, 0), col4 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), col5 = c(0, 0, 0, 0, 0, 0, 0, 0, 
0, 9796823.229998), col6 = c(35822181.695755, 17475066.870565, 
0, 0, 0, 0, 4040695.327278, 0, 13117249.623068, 0), col7 = c(0, 
0, 0, 0, 0, 18347258.910001, 0, 0, 7002205.087399, 0), No.Trans = c(2987L, 
1292L, 1002L, 796L, 691L, 677L, 400L, 388L, 381L, 366L)), .Names = c("Fiscal.Year", 
"col1", "col2", "col3", "col4", "col5", "col6", "col7", "No.Trans"
), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

This code doesn't work:

data %>%  #No
        dplyr::rowwise() %>%
        dplyr::mutate(sum = sum(.[2:8]))

Just for reference, I have tried the following set of code, and they work. I am specifically looking for a solution that uses rowwise() and sum().

Option 1: Discussed at: Summarise over all columns

  data %>%
    dplyr::rowwise() %>%
    do(data.frame(., res = sum(unlist(.)[2:8])))

Option 2:

  rowSums(data[,2:8])

Option 3: Discussed at:How to do rowwise summation over selected columns using column index with dplyr?

  data %>% mutate(sum=Reduce("+",.[2:8]))

Option 4:

data %>%
        select(2:8)%>%
        dplyr::mutate(sum=rowSums(.))
Community
  • 1
  • 1
watchtower
  • 4,140
  • 14
  • 50
  • 92
  • Why do you need to go through this route, it is not efficient compared to some of the methods you mentioned in the post – akrun Jan 07 '17 at 05:59
  • @akrun - Thanks for your question. "By efficient", are you referring to the one from base R? As a beginner, I believe that I lack knowledge about `dplyr`. Hence, I want to learn how to fix errors. This would just help me. – watchtower Jan 07 '17 at 06:02
  • The two methods `rowSums` and `Reduce` are very efficient – akrun Jan 07 '17 at 06:04
  • Regarding why it doesn't work, I am not sure. The values you are getting is the same as `data[-1] %>% rowwise() %>% sum` ie. sum of entire values, it could be a bug or design feature – akrun Jan 07 '17 at 06:07

1 Answers1

2

Those columns look suspiciously like observations....
If so, tidying that dataframe up would make the data wrangling significantly easier.

Does this get you the answers you are seeking?

data %>%
    gather(key = col, val = revenue, `col1`:`col7`) %>%
    group_by(Fiscal.Year, No.Trans) %>%
    summarise(res = sum(revenue))

Source: local data frame [10 x 3]
Groups: Fiscal.Year [?]

   Fiscal.Year No.Trans      res
         <int>    <int>    <dbl>
1         2016      366  9796823
2         2016      381 20119455
3         2016      388        0
4         2016      400 32861493
5         2016      677 18347259
6         2016      691 34052101
7         2016      796 12717074
8         2016     1002 33251606
9         2016     1292 44088865
10        2016     2987 35822182

For a really smooth introduction to thinking tidily please try here. The functions he discusses in the presentation have been updated, but Hadley does a great job teaching the subject: through pedagogical chaining, as it were.

The updated functions can be found in his ggplot2 book here.

leerssej
  • 14,260
  • 6
  • 48
  • 57