To calculate the unique users in a dataset one can use the following code:
library(tidyverse)
library(igraph)
graph.data.frame(dat) %>%
components() %>%
pluck(membership) %>%
stack() %>%
set_names(c('GRP', 'user_id')) %>%
right_join(dat %>% mutate(user_id = as.factor(user_id)), by = c('user_id'))
I was wondering if there is a way to expand/modify that piece of code, so we can do the same procedure for more than 2 fields. For example, for the following data:
dat <- data.frame(user_id = c(101,102,102,103,103,106, 107, 111, 112),
phone_number = c(4030201, 4030201, 4030202, 4030202, 4030203, 4030204, 4030205, 4030203, 4030206),
email = c("a@gmail.com", "b@gmail.com", "c@gmail.com", "d@gmail.com", "e@gmail.com", "f@gmail.com", "g@gmail.com", "h@gmail.com", "a@gmail.com"))
Any ideas on how can the code be modified for more than 2 fields? Thanks!