4

I have a larger data frame that has multiple columns and thousands of rows. I want to replace the value of every lead row by subtracting the previous row value from the lead row for every five rows of the data frame. For example, the first value should retain its value, the second row should be: second row - first row. Similarly, the sixth row should retain its value, however, the seventh row would be seventh row - sixth row. Here is an example data frame

DF = data.frame(A= c(1:11), B = c(11:21))

The outputput should be like below

> Output
    A  B
1   1 11
2   1  1
3   1  1
4   1  1
5   1  1
6   6 16
7   1  1
8   1  1
9   1  1
10  1  1
11 11 21
CForClimate
  • 335
  • 5
  • 19

3 Answers3

4

One option would be to create a grouping variable and then do the transformation with diff which does the difference of adjacent elements of the columns selected in mutate_all (if only a subset of columns are needed either use mutate_if or mutate_at)

library(dplyr) #v_0.8.3
DF %>% 
   group_by(grp = as.integer(gl(n(), 5, n()))) %>% 
   mutate_all(~c(first(.), diff(.))) %>%
   ungroup %>%
   select(-grp)
# A tibble: 11 x 2
#       A     B
#   <int> <int>
# 1     1    11
# 2     1     1
# 3     1     1
# 4     1     1
# 5     1     1
# 6     6    16
# 7     1     1
# 8     1     1
# 9     1     1
#10     1     1
#11    11    21

The above also gives a warning when we use mutate_all after group_by (previously it used to work - in the new versions, the correct syntax would be to use mutate_at

DF %>% 
   group_by(grp = as.integer(gl(n(), 5, n()))) %>% 
   mutate_at(vars(-group_cols()), ~c(first(.), diff(.))) %>%
   ungroup %>%
   select(-grp)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Getting this error message while executing the code ``mutate_all()` ignored the following grouping variables: Column `grp` Use `mutate_at(df, vars(-group_cols()), myoperation)` to silence the message. Error in select(., -grp) : unused argument (-grp)` – CForClimate Oct 11 '19 at 17:44
  • @Hydrologist Is it on the same data you showed as I can't reproduce it with `dplyr_0.8.3` – akrun Oct 11 '19 at 17:47
  • I have different data but tried on the same data on my end. Getting same error – CForClimate Oct 11 '19 at 17:48
  • @Hydrologist So, you are not getting the error with the data you showed, is that right? Can you check the `str` of new data and compare with data showed. I have a feeling that there are some group attributes causing it. Can you `ungroup` and then do the `group_by` again – akrun Oct 11 '19 at 17:55
  • I am getting error with the same data using your code. I am adding a screen shot to your provided answer as I do not know if I can include it in my comment. – CForClimate Oct 11 '19 at 17:57
  • @Hydrologist Do you have the same `dplyr` version or an older version – akrun Oct 11 '19 at 17:57
  • @Hydrologist Can you update by editing it in your post – akrun Oct 11 '19 at 18:00
3
f = function(d, n = 5) ave(d, ceiling(seq_along(d)/n), FUN = function(x) c(x[1], diff(x)))
data.frame(lapply(DF, f))
#    A  B
#1   1 11
#2   1  1
#3   1  1
#4   1  1
#5   1  1
#6   6 16
#7   1  1
#8   1  1
#9   1  1
#10  1  1
#11 11 21

Another option would be to create another data.frame with shifted rows and subtract directly

ind = ave(1:nrow(DF), ceiling(1:nrow(DF)/5), FUN = function(x) c(x[1], x[-length(x)]))
DF2 = DF[ind,] * replace(rep(1, nrow(DF)), diff(ind) == 0, 0)
DF - DF2
d.b
  • 32,245
  • 6
  • 36
  • 77
2

You can %/% the row number minus 1 by 5 to get the groups, then use diff to get the difference from the previous x (or 0 if there is no previous x) from x for all columns x for each group.

library(data.table)
setDT(DF)

DF[, lapply(.SD, function(x) diff(c(0, x)))
   , (1:nrow(DF) - 1) %/% 5][, -1]

#      A  B
#  1:  1 11
#  2:  1  1
#  3:  1  1
#  4:  1  1
#  5:  1  1
#  6:  6 16
#  7:  1  1
#  8:  1  1
#  9:  1  1
# 10:  1  1
# 11: 11 21

Or, as mentioned by @akrun, you could avoid lapply by replacing

lapply(.SD, function(x) diff(c(0, x)))

with

.SD - shift(.SD, fill = 0)

Another less serious option:

x <- DF[, !(.I - 1) %% 5]
DF*(1 + x) - DF[DF[, .I - !x]]

#      A  B
#  1:  1 11
#  2:  1  1
#  3:  1  1
#  4:  1  1
#  5:  1  1
#  6:  6 16
#  7:  1  1
#  8:  1  1
#  9:  1  1
# 10:  1  1
# 11: 11 21
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38