Create binary variable based on number of unique / distinct values by group

Question

I have data as follows:

userID  <- c(1,1,1,2,2,2,3,3,3)
product <- c("a","a","a","b","b","c","a","b","c")
df <- data.frame(userID, product)

For each 'userID', I want to create a binary indicator variable which is 1 if there are more than one unique product, and 0 if all products are the same.

so my filled vector would look like:

df$result <- c(0,0,0,1,1,1,1,1,1)
#    userID product result
# 1      1       a      0
# 2      1       a      0
# 3      1       a      0
# 4      2       b      1
# 5      2       b      1
# 6      2       c      1
# 7      3       a      1
# 8      3       b      1
# 9      3       c      1

E.g. user 1 has only one distinct product ('a') -> result = 0. User 2 has more than one unique product ('b' and 'c') -> result = 1.

score 5 · Answer 1 · edited Feb 17 '21 at 12:47

Here's one way to achieve this

library(data.table)
setDT(df)[, result := as.integer(uniqueN(product) > 1), by = userID]
# or
# setDT(df)[, result := as.integer(length(unique(product)) > 1), by = userID]
df
#    userID product result
# 1:      1       a      0
# 2:      1       a      0
# 3:      1       a      0
# 4:      2       b      1
# 5:      2       b      1
# 6:      2       c      1
# 7:      3       a      1
# 8:      3       b      1
# 9:      3       c      1

Or

library(dplyr)
df %>%
  group_by(userID) %>%
  mutate(result = as.integer(n_distinct(product) > 1))

score 3 · Accepted Answer · edited Oct 15 '14 at 14:44

3

You could use ave from base R

 df$result <- with(df, ave(as.character(product), userID, 
                 FUN=function(x) length(unique(x)))>1) +0 
 df$result
 [1] 0 0 0 1 1 1 1 1 1

Or as suggested by @David Arenburg, you could use transform and create a new variable result within the df

  transform(df, result = (ave(as.character(product), 
          userID, FUN = function(x) length(unique(x)))>1)+0)

Or

tbl <- rowSums(!!table(df[,-3]))>1
(df$userID %in% names(tbl)[tbl])+0
 #[1] 0 0 0 1 1 1 1 1 1

edited Oct 15 '14 at 14:44

Daryl

37
1
4

answered Oct 15 '14 at 10:23

akrun

874,273
37
540
662

Ah you solved me the mystery of why I couldn't make `ave` work, `as.character`... So annoying – David Arenburg Oct 15 '14 at 10:28
@David Arenburg I also got the warning. But, then I thought about `as.character` or perhaps `as.numeric` would also fit – akrun Oct 15 '14 at 10:30
1

You could also maybe add a similar solution using `transform`, something like `transform(df, result = ave(as.character(product), userID, FUN = function(x) length(unique(x)) > 1) + 0))` – David Arenburg Oct 15 '14 at 10:32
The one I got working was `tbl <- rowSums(!!table(df[,-3]))>1` with the second command line changed to: `(df$userID %in% names(tbl)[tbl])+0` – Daryl Oct 15 '14 at 11:34

score 2 · Answer 3 · answered Oct 15 '14 at 10:23

You can use packages data.table or dplyr to solve this kind of split-apply-combine task. This is how you could do it using data.table:

library(data.table)
setDT(df)    ## convert to the new format
df[, result:=as.integer(length(unique(product)) > 1), by=userID]

Karolis Koncevičius · Answer 4 · 2014-10-15T10:59:12.103

1

Here is mine:

# table of users x number_of_products
myTable <- table(userID, product)
# one line from there:
(result <- ifelse(rowSums(myTable!=0)==1, 0, 1)[userID])
1 1 1 2 2 2 3 3 3 
0 0 0 1 1 1 1 1 1

edited Oct 15 '14 at 10:59

answered Oct 15 '14 at 10:46

Karolis Koncevičius

9,417
9
56
89

Create binary variable based on number of unique / distinct values by group

4 Answers4

Linked

Related