2

I'm currently on R trying to create for a DF multiple columns with the sum of previous one. Imagine I got a DF like this:

df=    
   sep-2016  oct-2016    nov-2016  dec-2016   jan-2017
1  70        153            NA        28        19
2  57         68            73       118        16
3  29         NA            19        32        36
4 177         36             3        54        53

and I want to add at the end the sum of the rows previous of the month that I'm reporting so for October you end up with the sum of sep and oct, and for November you end up with the sum of sep, oct and november and end up with something like this:

 df=    
     sep-2016  oct-2016    nov-2016  dec-2016   jan-2017 status-Oct2016 status-Nov 2016
    1  70        153            NA        28        19      223       223
    2  57         68            73       118        16      105       198
    3  29         NA            19        32        36       29        48
    4 177         36             3        54        53      213        93

I want to know a efficient way insted of writing a lots of lines of rowSums() and even if I can get the label on the iteration for each month would be amazing!

Thanks!

tyluRp
  • 4,678
  • 2
  • 17
  • 36

6 Answers6

2

We can use lapply to loop through the columns to apply the rowSums.

dat2 <- as.data.frame(lapply(2:ncol(dat), function(i){
  rowSums(dat[, 1:i], na.rm = TRUE)
}))

names(dat2) <- paste0("status-", names(dat[, -1]))

dat3 <- cbind(dat, dat2)

dat3
#   sep-2016 oct-2016 nov-2016 dec-2016 jan-2017 status-oct-2016 status-nov-2016 status-dec-2016 status-jan-2017
# 1       70      153       NA       28       19             223             223             251             270
# 2       57       68       73      118       16             125             198             316             332
# 3       29       NA       19       32       36              29              48              80             116
# 4      177       36        3       54       53             213             216             270             323

DATA

dat <- read.table(text = "   'sep-2016'  'oct-2016'    'nov-2016'  'dec-2016'   'jan-2017'
1  70        153            NA        28        19
                  2  57         68            73       118        16
                  3  29         NA            19        32        36
                  4 177         36             3        54        53",
                  header = TRUE, stringsAsFactors = FALSE)

names(dat) <- c("sep-2016", "oct-2016", "nov-2016", "dec-2016", "jan-2017")
www
  • 38,575
  • 12
  • 48
  • 84
  • You were too quick with good solution. I was trying to make it a bit generic. – MKR Feb 16 '18 at 22:13
  • 1
    @MKR Thanks. Your solution is good, too. Thanks for sharing it. – www Feb 16 '18 at 22:27
  • 1
    @www Thanks! On my actual data set I got the error: Error in rowSums(uniqueClients[, 4:i], na.rm = TRUE) : 'x' must be an array of at least two dimensions, i just added a drop = F on the script and it worked! – Daniel Beltran Feb 19 '18 at 16:21
1

Honestly I have no idea why you would want your data in this format, but here is a tidyverse method of accomplishing it. It involves transforming the data to a tidy format before spreading it back out into your wide format. The key thing to note is that in a tidy format, where month is a variable in a single column instead of spread across multiple columns, you can simply use group_by(rowid) and cumsum to calculate all the values you want. The last few lines are constructing the status- column names and spreading the data back out into a wide format.

library(tidyverse)
df <- read_table2(
  "sep-2016  oct-2016    nov-2016  dec-2016   jan-2017
  70        153            NA        28        19
  57         68            73       118        16
  29         NA            19        32        36
 177         36             3        54        53"
)

df %>%
  rowid_to_column() %>%
  gather("month", "value", -rowid) %>%
  arrange(rowid) %>%
  group_by(rowid) %>%
  mutate(
    value = replace_na(value, 0),
    status = cumsum(value)
    ) %>%
  gather("vartype", "number", value, status) %>%
  mutate(colname = ifelse(vartype == "value", month, str_c("status-", month))) %>%
  select(rowid, number, colname) %>%
  spread(colname, number)
#> # A tibble: 4 x 11
#> # Groups:   rowid [4]
#>   rowid `dec-2016` `jan-2017` `nov-2016` `oct-2016` `sep-2016`
#>   <int>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#> 1     1       28.0       19.0       0         153         70.0
#> 2     2      118         16.0      73.0        68.0       57.0
#> 3     3       32.0       36.0      19.0         0         29.0
#> 4     4       54.0       53.0       3.00       36.0      177  
#> # ... with 5 more variables: `status-dec-2016` <dbl>,
#> #   `status-jan-2017` <dbl>, `status-nov-2016` <dbl>,
#> #   `status-oct-2016` <dbl>, `status-sep-2016` <dbl>

Created on 2018-02-16 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42
1

A clean way to do it is by convert your data in a long format.

library(tibble)
library(tidyr)
library(dplyr)

your_data <- tribble(~"sep_2016",   ~"oct_2016",    ~"nov_2016",  ~"dec_2016",   ~"jan_2017",
  70,        153,            NA,        28,        19,
  57,         68,            73,       118,        16,
  29,         NA,            19,        32,        36,
 177,         36,             3,        54,        53)

You can change the format of your data.frame with gather from the tidyr package.

your_data_long <- your_data %>%
  rowid_to_column() %>% 
  gather(key = month_year, value = the_value, -rowid) 

head(your_data_long)
#> # A tibble: 6 x 3
#>   rowid month_year the_value
#>   <int>      <chr>     <dbl>
#> 1     1   sep_2016        70
#> 2     2   sep_2016        57
#> 3     3   sep_2016        29
#> 4     4   sep_2016       177
#> 5     1   oct_2016       153
#> 6     2   oct_2016        68

Once your data.frame is in a long format. You can compute cumulative sum with cumsumand dplyrfunctions mutate and group_by.

result <- your_data_long %>%
  group_by(rowid) %>% 
  mutate(cumulative_value = cumsum(the_value)) 

result
#> # A tibble: 20 x 4
#> # Groups:   rowid [4]
#>    rowid month_year the_value cumulative_value
#>    <int>      <chr>     <dbl>            <dbl>
#>  1     1   sep_2016        70               70
#>  2     2   sep_2016        57               57
#>  3     3   sep_2016        29               29
#>  4     4   sep_2016       177              177
#>  5     1   oct_2016       153              223
#>  6     2   oct_2016        68              125
#>  7     3   oct_2016        NA               NA
#>  8     4   oct_2016        36              213
#>  9     1   nov_2016        NA               NA
#> 10     2   nov_2016        73              198
#> 11     3   nov_2016        19               NA
#> 12     4   nov_2016         3              216
#> 13     1   dec_2016        28               NA
#> 14     2   dec_2016       118              316
#> 15     3   dec_2016        32               NA
#> 16     4   dec_2016        54              270
#> 17     1   jan_2017        19               NA
#> 18     2   jan_2017        16              332
#> 19     3   jan_2017        36               NA
#> 20     4   jan_2017        53              323

If you want to retrieve the starting form, you can do it with spread.

Flo.P
  • 371
  • 2
  • 7
1

My preferred solution would be:

# library(matrixStats)

DF <- as.matrix(df)
DF[is.na(DF)] <- 0

RES <- matrixStats::rowCumsums(DF)
colnames(RES) <- paste0("status-", colnames(DF))

cbind.data.frame(df, RES)

This is closest to what you are looking for with the rowSums.

RolandASc
  • 3,863
  • 1
  • 11
  • 30
1

One option could be using spread and gather function from tidyverse.

Note: The status column has been added even for the 1st month. And the status columns are not in order but values are correct.

The approach is:

# Data
df <- read.table(text = "sep-2016  oct-2016    nov-2016  dec-2016   jan-2017
70        153            NA        28        19
57         68            73       118        16
29         NA            19        32        36
177         36             3        54        53", header = T, stringsAsFactors = F)


library(tidyverse)

# Just add an row number as sl
df <- df %>% mutate(sl = row_number())

#Calculate the cumulative sum after gathering and arranging by date
mod_df <- df %>% 
  gather(key, value, -sl) %>%
  mutate(key = as.Date(paste("01",key, sep="."), format="%d.%b.%Y")) %>%
  arrange(sl, key) %>%
  group_by(sl) %>%
  mutate(status = cumsum(ifelse(is.na(value),0L,value) )) %>%
  select(-value) %>%
  mutate(key = paste("status",as.character(key, format="%b.%Y"))) %>%
  spread(key, status) 

# Finally join cumulative calculated sum columns with original df and then 
# remove sl column
inner_join(df, mod_df, by = "sl") %>% select(-sl)

#  sep.2016 oct.2016 nov.2016 dec.2016 jan.2017 status Dec.2016 status Jan.2017 status Nov.2016 status Oct.2016 status Sep.2016
#1       70      153       NA       28       19             251             270             223             223              70
#2       57       68       73      118       16             316             332             198             125              57
#3       29       NA       19       32       36              80             116              48              29              29
#4      177       36        3       54       53             270             323             216             213             177
MKR
  • 19,739
  • 4
  • 23
  • 33
0

Another base solution where we build a matrix accumulating the row sums :

status <- setNames(
  as.data.frame(t(apply(dat,1,function(x) Reduce(sum,'[<-'(x,is.na(x),0),accumulate = TRUE)))),
  paste0("status-",names(dat)))

status
#   status-sep-2016 status-oct-2016 status-nov-2016 status-dec-2016 status-jan-2017
# 1              70             223             223             251             270
# 2              57             125             198             316             332
# 3              29              29              48              80             116
# 4             177             213             216             270             323

Then bind it to your original data if needed :

cbind(dat,status[-1])
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167