4

I'm looking for a solution to add the column "desired_result" preferably using dplyr and/or ave(). See the data frame here, where the group is "section" and the unique instances I want my "desired_results" column to count sequentially are in "exhibit":

structure(list(section = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), exhibit = structure(c(1L, 
2L, 3L, 3L, 1L, 2L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"), 
desired_result = c(1L, 2L, 3L, 3L, 1L, 2L, 2L, 3L)), .Names = c("section", 
"exhibit", "desired_result"), class = "data.frame", row.names = c(NA, 
-8L))
Arun
  • 116,683
  • 26
  • 284
  • 387
John
  • 802
  • 2
  • 9
  • 19
  • `df$desired <- c("a"=1, "b"=2, "c"=3)[df$exhibit]` – Khashaa Jan 12 '15 at 05:40
  • 1
    I don't think your "rules" are clear enough. Are your data sorted, as they are in your example dataset here? – A5C1D2H2I1M1N2O1R2T1 Jan 12 '15 at 06:07
  • @Khashaa sorry, for the purposes of an example data frame, I shortened full names to single letters. There are a lot of names so I'll need to handle many more instances than just 3. – John Jan 12 '15 at 06:32
  • @Ananda Mahto, these names are not in alphabetical order, and are randomly occurring, so you're right, my data frame is misleading. The important part is that when the count reaches a duplicate in the exhibit column, it should repeat. – John Jan 12 '15 at 06:35

3 Answers3

6

dense_rank it is

library(dplyr)
df %>% 
  group_by(section) %>% 
  mutate(desire=dense_rank(exhibit))
#  section exhibit desired_result desire
#1       1       a              1      1
#2       1       b              2      2
#3       1       c              3      3
#4       1       c              3      3
#5       2       a              1      1
#6       2       b              2      2
#7       2       b              2      2
#8       2       c              3      3
Khashaa
  • 7,293
  • 2
  • 21
  • 37
5

I've recently pushed a function rleid() to data.table (currently available on the development version, 1.9.5), which does exactly this. If you're interested, you can install it by following this.

require(data.table) # 1.9.5, for `rleid()`
require(dplyr)
DF %>% 
  group_by(section) %>% 
  mutate(desired_results=rleid(exhibit))

#   section exhibit desired_result desired_results
# 1       1       a              1               1
# 2       1       b              2               2
# 3       1       c              3               3
# 4       1       c              3               3
# 5       2       a              1               1
# 6       2       b              2               2
# 7       2       b              2               2
# 8       2       c              3               3
Arun
  • 116,683
  • 26
  • 284
  • 387
  • Does this produce the same result as Khashaa's answer using `dense_rank`? I don't have data.table 1.9.5 installed yet. – talat Jan 12 '15 at 07:28
  • For ordered data it returns the same as `dense_rank`. In general, seems `rleid(x)` is like `rep(1:length(rle(x)$values), rle(x)$lengths)`. – Khashaa Jan 12 '15 at 07:34
  • `rleid()` works with more than 1 column as well. Ex: `with(DF, rleid(section, exhibit))` – Arun Jan 12 '15 at 07:42
2

If exact enumeration is necessary and you need the desired result to be consistent (so that a same exhibit in a different section will always have the same number), you can try:

library(dplyr)
df <- data.frame(section = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
                 exhibit = c('a', 'b', 'c', 'c', 'a', 'b', 'b', 'c'))
if (is.null(saveLevels <- levels(df$exhibit)))
    saveLevels <- sort(unique(df$exhibit)) ## or levels(factor(df$exhibit))
df %>%
    group_by(section) %>%
    mutate(answer = as.integer(factor(exhibit, levels = saveLevels)))
## Source: local data frame [8 x 3]
## Groups: section
##   section exhibit answer
## 1       1       a      1
## 2       1       b      2
## 3       1       c      3
## 4       1       c      3
## 5       2       a      1
## 6       2       b      2
## 7       2       b      2
## 8       2       c      3

If/when a new exhibit appears in subsequent sections, they should get newly enumerated results. (Notice the last exhibit is different.)

df2 <- data.frame(section = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
                  exhibit = c('a', 'b', 'c', 'c', 'a', 'b', 'b', 'd'))
if (is.null(saveLevels2 <- levels(df2$exhibit)))
    saveLevels2 <- sort(unique(df2$exhibit))
df2 %>%
    group_by(section) %>%
    mutate(answer = as.integer(factor(exhibit, levels = saveLevels2)))
## Source: local data frame [8 x 3]
## Groups: section
##   section exhibit answer
## 1       1       a      1
## 2       1       b      2
## 3       1       c      3
## 4       1       c      3
## 5       2       a      1
## 6       2       b      2
## 7       2       b      2
## 8       2       d      4
r2evans
  • 141,215
  • 6
  • 77
  • 149