7

Occasionally, I need to join a data frame to (usually a modified) version of itself within a dplyr chain. Something like this:

df  <- data.frame(
     id = c(1,2,3)
   , status = c('foo','bar','meh')
   , spouseid = c(4,3,2)
)


df %>% 
  filter( status == 'foo' | status == 'bar') %>% 
  # join the filtered table to itself using the dot as the right-hand side
  left_join(., by = c('id' = 'spouseid'))

When I try that, I get Error in is.data.frame(y) : argument "y" is missing, with no default.

crazybilly
  • 2,992
  • 1
  • 16
  • 42
  • Do you mean you want to join the original data frame to the modified one? If so, your join line can be `left_join(df, by = c('id' = 'spouseid'))`, which will put the original data frame on the rhs. Depending on which rows you want in the final result, you could also change this to a `right_join` or `full_join`. – eipi10 Sep 14 '16 at 16:47
  • I'm most interested in joining the edited data frame to itself without having to break the chain. In the past, I've always done my edits first, stored that in a new object, then done the joining or made the same edits within the join itself on the y argument. – crazybilly Sep 14 '16 at 17:04

1 Answers1

7

The problem is that using the dot just moves around the left hand side, so the way it's written above only passes the lhs into left_join(). To use the dot for both the left- and right-hand sides, use the dot twice:

df %>% 
  filter( status == 'foo' | status == 'bar') %>% 
  # the first dot is x argument and the second dot is the y argument
  left_join(
      x = . 
    , y = . 
    , by = c('id' = 'spouseid')
  )

This way, you're passing the lhs to both arguments of left_join() rather than relying on magrittr's implicit lhs like you normally would.

crazybilly
  • 2,992
  • 1
  • 16
  • 42
  • You can also drop the `x = ` and `y = ` and use `left_join(., ., by = c('id' = 'spouseid'))`. – steveb Sep 14 '16 at 16:44
  • 2
    Doesn't seem to work: df <- data.frame(Team = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "D", "D"), Date = c("2016-05-10","2016-05-10", "2016-05-10", "2016-05-10", "2016-05-12", "2016-05-12", "2016-05-12", "2016-05-15","2016-05-15", "2016-05-30", "2016-05-30"), Points = c(1,4,3,2,1,5,6,1,2,3,9) ) df %>% left_join(., . %>% distinct(Team, Date) %>% mutate(Date_Lagged = lag(Date))) – jzadra Jun 12 '18 at 22:48