-2

My R dataframe of teachers is organized by schoolid, with a variable number of teachers in every school. I want to generate a sequential id number for each teacher in my dataset.

Data looks like:

SCHOOLID  summer  
102349    1
102349    1
102349    1
102349    1
203456    1
203456    1
203456    1
345983    1
345983    1
345983    1
345983    1
345983    1

What I need to generate:

SCHOOLID  summer  teacher_id
102349    1      1
102349    1      2
102349    1      3
102349    1      4
203456    1      1
203456    1      2
203456    1      3
345983    1      1
345983    1      2
345983    1      3
345983    1      4
345983    1      5
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Suhas
  • 31
  • 2

2 Answers2

1

Try (assuming data frame is named mydf, change accordingly):

mydf$teacher_id <- ave( mydf$SCHOOLID, mydf$SCHOOLID, FUN=seq_along)
Greg Snow
  • 48,497
  • 6
  • 83
  • 110
0

Update

This seems to be a fairly frequent type of problem, so I wrote a function called getanID and included it in my "splitstackshape" package. Usage would be as follows:

library(splitstackshape)
getanID(mydf, "SCHOOLID")              ## If just one "ID"
getanID(mydf, c("SCHOOLID", "summer")) ## "ID" can be a vector too

Original answer

This is super easy with ave (and is definitely a duplicate question here on Stack Overflow).

The typical approach (if your data.frame were called "mydf") would be:

ave(rep(1, nrow(mydf)), mydf, FUN = seq_along)
#  [1] 1 2 3 4 1 2 3 1 2 3 4 5

Replace the "mydf" with the actual columns that should be treated as your grouping columns. Here, I've just assumed that both columns should serve as the IDs.


In the above, I've grouped by the entire data.frame. However, if you only wanted to group by the first column, you would change the command to:

ave(rep(1, nrow(mydf)), mydf[1], FUN = seq_along)

(And, at this point, my answer and Greg's are pretty much the same, except for his use of SCHOOLID for aggregating instead of a dummy column of 1s.)

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Thanks, I tried this line, replacing mydf with the name of my dataframe, but I got an error message Error: unexpected numeric constant in "ave(rep)1 – Suhas Mar 11 '14 at 17:04
  • @Suhas, are you paying attention to your brackets? – A5C1D2H2I1M1N2O1R2T1 Mar 11 '14 at 17:06
  • My apologies, I use Rstudio which automatically generates a closing bracket in the Rconsole and I overlooked that. The command below on my dataframe called myBy2 generates a vector of 1s of the same length as my dataframe (length of 6043) > myBy2$TEACHERID <- ave(rep(1,nrow(myBy2)),myBy2,FUN=seq_along) – Suhas Mar 11 '14 at 17:17
  • @Suhas, that's why I mentioned "replace the 'mydf' with the actual columns that should be treated as your grouping columns" (which is what Greg has in his answer). In other words, if your grouping columns were column 2, 3, and 4, you would do something like `ave(rep(1, nrow(myBy2)), myBy2[c(2, 3, 4)], FUN = seq_along)`. – A5C1D2H2I1M1N2O1R2T1 Mar 11 '14 at 17:22
  • Thanks, it works now ! I can either use the `rep` and `nrow` functions as you suggest, or just pass the arguments directly to `ave` and both methods generate the identical result > myBy2$TEACHERID <- ave(rep(1,nrow(myBy2)),myBy2$SCHOOLID,FUN=seq_along) – Suhas Mar 11 '14 at 17:31