4

I want to to implement a classic self-join problem in R using dplyr but unable to do so.

Input: Dataframe with empid, empname and managerid

Output: Dataframe with managername for each empid

library(dplyr)


empid = c(1, 2, 3, 13, 11, 9, 8, 7, 3)
empname = paste0("emp", empid) 
managerid = c(0, 0, 1, 2, 2, 1, 3, 2, 1)
df = data.frame(empid, empname, managerid)
df
# empid empname managerid
#      1    emp1         0
#      2    emp2         0
#      3    emp3         1
#     13   emp13         2
#     11   emp11         2
#      9    emp9         1
#      8    emp8         3
#      7    emp7         2
#      3    emp3         1


df %>% 
  unique() %>% 
  left_join(., ., by = ("managerid" = "empid"))

# empid empname.x managerid.x empname.y managerid.y
#      1      emp1           0      emp1           0
#      2      emp2           0      emp2           0
#      3      emp3           1      emp3           1
#     13     emp13           2     emp13           2
#     11     emp11           2     emp11           2
#      9      emp9           1      emp9           1
#      8      emp8           3      emp8           3
#      7      emp7           2      emp7           2

The code clearly didn't give the desired result; which should be something like:

# empid empname.x managerid.x empname.y 
#      1      emp1           0      NA           
#      2      emp2           0      NA           
#      3      emp3           1    emp1           
#     13     emp13           2    emp2           
#     11     emp11           2    emp2           
#      9      emp9           1    emp1           
#      8      emp8           3    emp3          
#      7      emp7           2    emp2 
Jaap
  • 81,064
  • 34
  • 182
  • 193
ar7
  • 51
  • 4

1 Answers1

6

You just need to make your by condition a vector: left_join(., ., by = c("managerid" = "empid"))

Pdubbs
  • 1,967
  • 2
  • 11
  • 20