1

I'm sure this must be a dupe but I just can't get it to work. I want to add an ID col to a data frame, which resets to 1 for each unique value in one column. Best way to describe is by example:

gr1 <- c("A","A","A","B","B","B")
gr2 <- c(1,1,2,3,4,4)

df <- data.frame(gr1, gr2)

Desired output:

id <- c(1,1,2,1,2,2)
df <- cbind(df, id)

The id is marking unique values of gr2 within the each subset of gr1. When gr1 changed from A to B, the id resets to 1. I have read this (Assign an ID based on two columns R) but that is not what I want. I don't want to add a rank function (I think) because by I want my ties all to have the same id within gr1 e.g.

df2 <- df %>% group_by(gr1) %>% mutate(id=rank(gr2, ties.method="max")) 

Banging my head against the wall. Any pointers would be a great help.

Community
  • 1
  • 1
Pete900
  • 2,016
  • 1
  • 21
  • 44

3 Answers3

5

We could use ?rleid from the data.table package.

library(data.table)
setDT(df)[, id := rleid(gr2), by = gr1]
> df
   gr1 gr2 id
1:   A   1  1
2:   A   1  1
3:   A   2  2
4:   B   3  1
5:   B   4  2
6:   B   4  2
mtoto
  • 23,919
  • 4
  • 58
  • 71
4

Try this which uses ave to perform the grouping and factor to reassign sequential levels from 1. Note that ave automatically converts the factor back to numeric because gr2 is numeric and it makes the result consistent. No packages are used.

df2 <- transform(df, gr2 = ave(gr2, gr1, FUN = factor))

giving:

> df2
  gr1 gr2
1   A   1
2   A   1
3   A   2
4   B   1
5   B   2
6   B   2

It returns a data frame with factor and numeric columns:

> str(df2)
'data.frame':   6 obs. of  2 variables:
 $ gr1: Factor w/ 2 levels "A","B": 1 1 1 2 2 2
 $ gr2: num  1 1 2 1 2 2
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Just trying this on my real data set. It seems to return the factor itself rather than an id. I will try and post some real data. – Pete900 May 03 '16 at 16:12
  • Not on the data provided in the question. Maybe your data frame has a factor as gr2 rather than numeric. If so, convert it to numeric first so that it corresponds to what was posted. – G. Grothendieck May 03 '16 at 16:42
  • Yes you are correct. Thank you also. – Pete900 May 03 '16 at 17:48
3

Here is a dplyr solution

df %>%
  group_by(gr1) %>%
  mutate(id=as.numeric(factor(gr2))) 
C_Z_
  • 7,427
  • 5
  • 44
  • 81