-1

I am trying to do a rowSum for the actuals columns. However, I would like to include the values up to the UpTo date for certain observations. Here is the data frame:

dat <- structure(list(Company = c("ABC", "DEF", "XYZ"), UpTo = c(NA, 
"Q2", "Q3"), Actual.Q1 = c(100L, 80L, 100L), Actual.Q2 = c(50L, 
75L, 50L), Forecast.Q3 = c(80L, 50L, 80L), Forecast.Q4 = c(90L, 
80L, 100L)), .Names = c("Company", "UpTo", "Actual.Q1", "Actual.Q2", 
"Forecast.Q3", "Forecast.Q4"), class = "data.frame", row.names = c("1", 
"2", "3"))

  Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4
1     ABC   NA       100        50          80          90
2     DEF   Q2        80        75          50          80
3     XYZ   Q3       100        50          80         100
  • For company ABC, since there is no UpTo date, it will just be Actual.Q1 + Actual.Q2, which is 150.
  • For company DEF, since the UpTo date is Q2, it will be Actual.Q1 + Actual.Q2, which is 155.
  • For company XYZ, since the UpTo date is Q3, it will be Actual.Q1 + Actual.Q2 + Forecast.Q3, which is 230.

The resulting data frame would look like this:

  Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4 SumRecent
1     ABC   NA       100        50          80          90       150
2     DEF   Q2        80        75          50          80       155
3     XYZ   Q3       100        50          80         100       230

I have tried to use the rowSums function. However, it does not take into effect the variable UpTo. Any help is appreciated. Thanks!

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
J Kang
  • 1
  • 5
  • If it is NA, it will only be Q1 + Q2, since we only want the actuals if there are no up to date specified. – J Kang Nov 23 '16 at 18:08

4 Answers4

1

We can use binary weighted row sums.

UpTo <- as.character(dat$UpTo)  ## in case you have factor column
UpTo[is.na(UpTo)] <- "Q2"  ## replace `NA` to "Q2"
w <- outer(as.integer(substr(UpTo, 2, 2)), 1:4, ">=")
#     [,1] [,2]  [,3]  [,4]
#[1,] TRUE TRUE FALSE FALSE
#[2,] TRUE TRUE FALSE FALSE
#[3,] TRUE TRUE  TRUE FALSE

We have a logical matrix. But it does not affect arithmetic computation as TRUE is 1 and FALSE is 0. Then we do weighted row sums:

X <- data.matrix(dat[3:6])
dat$SumRecent <- rowSums(X * w)

#  Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4 SumRecent
#1     ABC <NA>       100        50          80          90       150
#2     DEF   Q2        80        75          50          80       155
#3     XYZ   Q3       100        50          80         100       230

The advantage of this approach is its speed / efficiency, as it is fully vectorized. This method is super fast. You can refer to the benchmark result in Fast way to create a binary matrix with known number of 1 each row in R.

Community
  • 1
  • 1
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
1

Here is a possibility:

df$SumRecent <- sapply(1:nrow(df), function(x) {sum(df[x,3:ifelse(is.na(grep(df[x,2], colnames(df))[1]), 4, grep(df[x,2], colnames(df))[1])])})


#   Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4 SumRecent
# 1     ABC <NA>       100        50          80          90       150
# 2     DEF   Q2        80        75          50          80       155
# 3     XYZ   Q3       100        50          80         100       230

We are looking with the use of grep for a match of the value in the column UpTo (df[x,2]) in the column names of df (colnames(df)). If we find it we get the sum, if we don't find it we just sum the values in columns 3 and 4.

etienne
  • 3,648
  • 4
  • 23
  • 37
  • From experience I've really really really hated myself for using numeric references in a data.frame. Use names instead, always - trust me! Brevity in this case is not worth the loss of sanity when things break because a column position changed. – Brandon Bertelsen Nov 23 '16 at 20:27
  • @BrandonBertelsen: I agree that it can give some problems, but in this case, without using numeric references, we'll have to look at every column name ending with QXXX, make sure that they are in increasing order, and so on, which seem to complicate too much the problem. (or do you have a quicker solution ?) I assumed the format of the data.frame was fixed to answer the question. – etienne Nov 23 '16 at 20:41
  • It's not a criticism, just a word of caution with respect to programming like this in R, in general. Stackoverflow answers have big reach. So having a note of caution is useful for new programmers :) – Brandon Bertelsen Nov 23 '16 at 20:47
  • 1
    @BrandonBertelsen oh I didn't take it as a criticism sorry if my comment made you think that, I appreciate the word of caution – etienne Nov 23 '16 at 20:49
0

This should also work:

df$UpTo <- as.character(df$UpTo)
df$SumRecent <- apply(df, 1, function(x) ifelse(is.na(x[2]), sum(as.integer(x[3:4])), 
                                           sum(as.integer(x[3:(grep(x[2], names(df)))]))))
df

#     Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4 SumRecent
#1     ABC <NA>       100        50          80          90       150
#2     DEF   Q2        80        75          50          80       155
#3     XYZ   Q3       100        50          80         100       230
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
0

Another approach using data table:

require(data.table)
dat <- fread('Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4
             ABC   NA       100        50          80          90
             DEF   Q2        80        75          50          80
             XYZ   Q3       100        50          80         100')

dat[, SumRecent:= ifelse(is.na(UpTo), Actual.Q1 + Actual.Q2,  
                                      sum(.SD[, grepl(paste0("Q[1-", substring(UpTo, 2), "]$"), names(.SD)), with = F]) ), by = Company]
User2321
  • 2,952
  • 23
  • 46