0

I have a df as follows and want to split the df by noms (unique id), and then add x number of rows to each group. I then want to recombine. x will be different for each group and will equal the number of rows to increase the positive integers up to 12. (in other words, the value of x = 12-highest positive integer for each person).

ddply seems to be the obvious option here, but I am having trouble adding the rows. I can make a new column with the following code

x<-ddply(df,.(noms),transform, new_time=numbers)

but this doesn't solve the problem of adding of adding extra rows for each person. I thought 'mutate' might do this for me, but apart from my logic being awful here, it doesn't add on the rows.

x<-ddply(df,.(noms),mutate, new_time=numbers+(tail(df$numbers-12)))

Is it possible to add rows using ddply? or even split? any help would be hugely appreciated. thank you in advance.

here's the df and below is the desired ouput.

df
   noms numbers
1  jane      -6
2  jane      -5
3  jane      -4
4  jane      -3
5  jane      -2
6  jane      -1
7  jane       1
8  jane       2
9  jane       3
10 jane       4
11 john      -2
12 john      -1
13 john       1
14 john       2
15 john       3
16 john       4
17 john       5
18 john       6
19 john       7
20 john       8
21 mary      -1
22 mary       1
23 mary       2
24 mary       3
25 mary       4
26 mary       5
27 mary       6
28 mary       7
29 mary       8
30 mary       9
31  tom      -4
32  tom      -3
33  tom      -2
34  tom      -1
35  tom       1
36  tom       2
37  tom       3
38  tom       4
39  tom       5
40  tom       6

desired output

dff
   noms nums new_times
1  jane   -6        -6
2  jane   -5        -5
3  jane   -4        -4
4  jane   -3        -3
5  jane   -2        -2
6  jane   -1        -1
7  jane    1         1
8  jane    2         2
9  jane    3         3
10 jane    4         4
11 jane   NA         5
12 jane   NA         6
13 jane   NA         7
14 jane   NA         8
15 jane   NA         9
16 jane   NA        10
17 jane   NA        11
18 jane   NA        12
19 john   -2        -2
20 john   -1        -1
21 john    1         1
22 john    2         2
23 john    3         3
24 john    4         4
25 john    5         5
26 john    6         6
27 john    7         7
28 john    8         8
29 john   NA         9
30 john   NA        10
31 john   NA        11
32 john   NA        12
33 mary   -1        -1
34 mary    1         1
35 mary    2         2
36 mary    3         3
37 mary    4         4
38 mary    5         5
39 mary    6         6
40 mary    7         7
41 mary    8         8
42 mary    9         9
43 mary   NA        10
44 mary   NA        11
45 mary   NA        12
46  tom   -4        -4
47  tom   -3        -3
48  tom   -2        -2
49  tom   -1        -1
50  tom    1         1
51  tom    2         2
52  tom    3         3
53  tom    4         4
54  tom    5         5
55  tom    6         6
56  tom   NA         7
57  tom   NA         8
58  tom   NA         9
59  tom   NA        10
60  tom   NA        11
61  tom   NA        12

EDIT

thank you to @rrs for his contribution. the code works fine on toy data but on the real dataset, the following error pops up

Error in rep(NA, length(pootdf$new_numbers) - length(pootdf$time)) : 
  invalid 'times' argument

the only difference between the toy data and the real data is that the big data is MUCH bigger at about 400,000 rows. Both name variables are set up as factors, and the numbers variable is set up as integer. I have subsetted the large DF to s amaller more manageable one at about 100 rows and the error still appears. Does anyone know what could be happening, and how I might go about fixing it? Below is the traceback.

traceback()
7: .fun(piece, ...)
6: function (i) 
   {
       piece <- pieces[[i]]
       if (.inform) {
           res <- try(.fun(piece, ...))
           if (inherits(res, "try-error")) {
               piece <- paste(capture.output(print(piece)), collapse = "\n")
               stop("with piece ", i, ": \n", piece, call. = FALSE)
           }
       }
       else {
           res <- .fun(piece, ...)
       }
       progress$step()
       res
   }(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(pootdf, .(hai_dispense_number), AddRows)
user2363642
  • 727
  • 9
  • 26

1 Answers1

2

I think this will do what you want:

AddRows <- function(df) {
  new_numbers <- seq(from = min(df$numbers), to = 12)
  new_numbers <- new_numbers[new_numbers != 0]
  noms <- rep(unique(df$noms), length(new_numbers))
  numbers <- c(df$numbers, rep(NA, length(new_numbers) - length(df$numbers)))

  return(data.frame(noms, numbers, new_numbers))
}

ddply(df, .(noms), AddRows)
rrs
  • 9,615
  • 4
  • 28
  • 38
  • thank you so much for your help with writing a function. Your code works perfectly on my toy data, but when I apply to my real data I get the following error Error in rep(NA, length(new_numbers) - length(pootdf$hai_dispense_number)) : invalid 'times' argument. my df is 375168 rows long - but is should be ok to increase the number of rows right. I know that the df i desire will be less than 601092 rows. – user2363642 Jan 13 '14 at 18:45
  • discovered error - had nothing to do with your code - more to do with my data! – user2363642 Jan 29 '14 at 10:59