I have a df as follows and want to split the df by noms (unique id), and then add x number of rows to each group. I then want to recombine. x will be different for each group and will equal the number of rows to increase the positive integers up to 12. (in other words, the value of x = 12-highest positive integer for each person).
ddply seems to be the obvious option here, but I am having trouble adding the rows. I can make a new column with the following code
x<-ddply(df,.(noms),transform, new_time=numbers)
but this doesn't solve the problem of adding of adding extra rows for each person. I thought 'mutate' might do this for me, but apart from my logic being awful here, it doesn't add on the rows.
x<-ddply(df,.(noms),mutate, new_time=numbers+(tail(df$numbers-12)))
Is it possible to add rows using ddply? or even split? any help would be hugely appreciated. thank you in advance.
here's the df and below is the desired ouput.
df
noms numbers
1 jane -6
2 jane -5
3 jane -4
4 jane -3
5 jane -2
6 jane -1
7 jane 1
8 jane 2
9 jane 3
10 jane 4
11 john -2
12 john -1
13 john 1
14 john 2
15 john 3
16 john 4
17 john 5
18 john 6
19 john 7
20 john 8
21 mary -1
22 mary 1
23 mary 2
24 mary 3
25 mary 4
26 mary 5
27 mary 6
28 mary 7
29 mary 8
30 mary 9
31 tom -4
32 tom -3
33 tom -2
34 tom -1
35 tom 1
36 tom 2
37 tom 3
38 tom 4
39 tom 5
40 tom 6
desired output
dff
noms nums new_times
1 jane -6 -6
2 jane -5 -5
3 jane -4 -4
4 jane -3 -3
5 jane -2 -2
6 jane -1 -1
7 jane 1 1
8 jane 2 2
9 jane 3 3
10 jane 4 4
11 jane NA 5
12 jane NA 6
13 jane NA 7
14 jane NA 8
15 jane NA 9
16 jane NA 10
17 jane NA 11
18 jane NA 12
19 john -2 -2
20 john -1 -1
21 john 1 1
22 john 2 2
23 john 3 3
24 john 4 4
25 john 5 5
26 john 6 6
27 john 7 7
28 john 8 8
29 john NA 9
30 john NA 10
31 john NA 11
32 john NA 12
33 mary -1 -1
34 mary 1 1
35 mary 2 2
36 mary 3 3
37 mary 4 4
38 mary 5 5
39 mary 6 6
40 mary 7 7
41 mary 8 8
42 mary 9 9
43 mary NA 10
44 mary NA 11
45 mary NA 12
46 tom -4 -4
47 tom -3 -3
48 tom -2 -2
49 tom -1 -1
50 tom 1 1
51 tom 2 2
52 tom 3 3
53 tom 4 4
54 tom 5 5
55 tom 6 6
56 tom NA 7
57 tom NA 8
58 tom NA 9
59 tom NA 10
60 tom NA 11
61 tom NA 12
EDIT
thank you to @rrs for his contribution. the code works fine on toy data but on the real dataset, the following error pops up
Error in rep(NA, length(pootdf$new_numbers) - length(pootdf$time)) :
invalid 'times' argument
the only difference between the toy data and the real data is that the big data is MUCH bigger at about 400,000 rows. Both name variables are set up as factors, and the numbers variable is set up as integer. I have subsetted the large DF to s amaller more manageable one at about 100 rows and the error still appears. Does anyone know what could be happening, and how I might go about fixing it? Below is the traceback.
traceback()
7: .fun(piece, ...)
6: function (i)
{
piece <- pieces[[i]]
if (.inform) {
res <- try(.fun(piece, ...))
if (inherits(res, "try-error")) {
piece <- paste(capture.output(print(piece)), collapse = "\n")
stop("with piece ", i, ": \n", piece, call. = FALSE)
}
}
else {
res <- .fun(piece, ...)
}
progress$step()
res
}(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(pootdf, .(hai_dispense_number), AddRows)