R: replicate a row and update by next date per row

Question

The input and its intended output show that I want to replicate the row of the input and update the date entry. How can I do this?

Input

> aa<- data.frame(a=c(1,11,111),b=c(2,22,222),length=c(3,5,1),date=c(as.Date("28.12.2016",format="%d.%m.%Y"), as.Date("30.12.2016",format="%d.%m.%Y"), as.Date("01.01.2017",format="%d.%m.%Y")))
> aa
    a   b length       date
1   1   2      3 2016-12-28
2  11  22      5 2016-12-30
3 111 222      1 2017-01-01

Intended Output

  a   b length       date
1 1   2      3 2016-12-28
2 1   2      3 2016-12-29
3 1   2      3 2016-12-30
4 11  22     5 2016-12-30
5 11  22     5 2016-12-31
6 11  22     5 2017-01-01
7 11  22     5 2017-01-02
8 11  22     5 2017-01-03
9 111 222    1 2017-01-01

Do not oversimplify the example, does your dataframe really have only one row? — Pierre L, Dec 28 '16 at 10:54
@PierreLafortune updated the Minimal Working Example with two other entries. — Regan Alpha, Dec 28 '16 at 11:08

Pierre L · Accepted Answer · 2016-12-28T11:52:10.613

2

You can use base, dplyr, or data.table for the grouping operations. First repeat the rows to get the size of the new data correct. Then increment the days.

library(dplyr)
aa2 <- aa[rep(1:nrow(aa), aa$length),]
aa2 %>% group_by(a,b) %>% mutate(date= date + 1:n() - 1L)
# Source: local data frame [9 x 4]
# Groups: a, b [3]
# 
#       a     b length       date
#   <dbl> <dbl>  <dbl>     <date>
# 1     1     2      3 2016-12-28
# 2     1     2      3 2016-12-29
# 3     1     2      3 2016-12-30
# 4    11    22      5 2016-12-30
# 5    11    22      5 2016-12-31
# 6    11    22      5 2017-01-01
# 7    11    22      5 2017-01-02
# 8    11    22      5 2017-01-03
# 9   111   222      1 2017-01-01

#data.table
library(data.table)
aa2 <- aa[rep(1:nrow(aa), aa$length),]
setDT(aa2)[, date := date + 1:.N - 1L, by= .(a,b)]

#base
aa2 <- aa[rep(1:nrow(aa), aa$length),]
transform(aa2, date=ave(date, a, FUN=function(x) x + 1:length(x) - 1L))

For more concise syntax, we can take advantage of the recycling rules of data.table, credit @Henrik:

setDT(aa)[ , .(date = date + 1:length - 1), by = .(a, b)]

edited Dec 28 '16 at 11:52

answered Dec 28 '16 at 11:20

Pierre L

28,203
6
47
69

Can you recommend which approach to use? I like the base solution with 1.1, 1.2, 2.1, 2.2, 2.3, 2.4, 2.5 ... but what are the pros and cons with each alternative? – Regan Alpha Dec 28 '16 at 11:36
It depends on your workflow. If you already have a dplyr pipe sequence going, it makes sense to continue with the same package. If you are already working with `data.table` for the rest of your project, it makes sense to use that package. There is a slight difference in the internal structure of the packages. `dplyr` will output a `tbl_df`. data.table will output a `data.table`. These are still table arrays, but may behave slightly differently if you are not used to them. – Pierre L Dec 28 '16 at 11:38
What about if I can make the choice? Is `dplyr` the newest candidate, easiest to learn? – Regan Alpha Dec 28 '16 at 11:40
Yes it is very easy for new users. But learning `base R` helps to really "get the language." `data.table` is quick and powerful, but requires more learning up front. – Pierre L Dec 28 '16 at 11:41
base R is not optimized as much as the packages for grouped operations. Getting lost in `aggregate`, `ave`, and `tapply` can be a headache. It is good to learn those eventually, but to get off the ground, `dplyr` is the most user friendly. But once you learn `data.table` you can work with larger datasets, and concise syntax. – Pierre L Dec 28 '16 at 11:45
In your `data.table` solution, you may rely on recycling instead of `rep`: `setDT(aa)[ , .(date = date + 1:length - 1), by = .(a, b)]` – Henrik Dec 28 '16 at 11:46

Serhat Cevikel · Answer 2 · 2016-12-28T11:31:42.383

0

Not as elegant as the one using dplyr and data.table packages, but low level:

replicaterow1 <- function(df1 = aa) {
    newdf <- df1[0,]
    rowss <- nrow(df1)
    rowcount <- 1
    for (i in 1:rowss) {
        rowi <- df1[i,]
        reps <- as.integer(rowi[3])
        newrow <- rowi
        newdf[rowcount,] <- rowi
        rowcount <- rowcount + 1
        if (reps > 1) {
            for(j in 1:(reps-1)) {
                newrow[4] <- newrow[4] + 1
                newdf[rowcount,] <- newrow
                rowcount <- rowcount + 1
            }
        }
    }
    return(newdf)
}

edited Dec 28 '16 at 11:31

answered Dec 28 '16 at 10:49

Serhat Cevikel

720
3
11

I was asked to provide longer example so added two extra entries, notice that the data can have multiple entries and the length species how many times the replication of rows must be done by the next date. This function does not use the length field... – Regan Alpha Dec 28 '16 at 11:13

score 0 · Answer 3 · answered Dec 28 '16 at 10:52

0

This

aa<- data.frame(a=c(1),b=c(2),length=c(3),date=as.Date("28.12.2016",format="%d.%m.%Y"))


aa <- aa[rep(row.names(aa), aa$length), 1:4]
aa <- as.data.table(aa)
aa[,row:=.I]
aa[,date:=date+row-1]
aa[,row:=NULL]

Results in

   a b length       date
1: 1 2      3 2016-12-28
2: 1 2      3 2016-12-29
3: 1 2      3 2016-12-30

answered Dec 28 '16 at 10:52

quant

4,062
5
29
70

There is some small error, tested with a bit larger example where the fourth row until the 9th row have wrong date entry, see the updated MWE. – Regan Alpha Dec 28 '16 at 11:19

R: replicate a row and update by next date per row

3 Answers3