1

I have the following three dataframes:

prim <- data.frame("t"=2007:2012,
                   "a"=1:6,
                   "b"=7:12)

secnd <- data.frame("t"=2012:2013,
                    "a"=c(5, 7))

third <- data.frame("t"=2012:2013,
                    "b"=c(11, 13))

I want to join secnd and third to prim in two steps. In the first step I join prim and secnd, where any existing elements in prim are overwritten by those in secnd, so we end up with:

     t  a  b
1 2007  1  7
2 2008  2  8
3 2009  3  9
4 2010  4 10
5 2011  5 11
6 2012  5 12
7 2013  7 NA

After this I want to join with third, where again existing elements are overwritten by those in third:

     t  a  b
1 2007  1  7
2 2008  2  8
3 2009  3  9
4 2010  4 10
5 2011  5 11
6 2012  5 11
7 2013  7 13

Is there a way to achieve this using dplyr or base R?

N08
  • 1,265
  • 13
  • 23
  • 1
    Based on your question, the output of the first step should be different from the one presented: 2012 is not overwritten although it is present in `second`? – Abdallah Atef Nov 02 '18 at 14:48
  • @AbdallahAtef Thanks, you are right - I updated the question – N08 Nov 02 '18 at 14:50

3 Answers3

4

By using dplyr you can do:

require(dplyr)

prim %>% full_join(secnd, by = 't') %>%
  full_join(third, by = 't') %>%
  mutate(a = coalesce(as.integer(a.y),a.x),
         b = coalesce(as.integer(b.y),b.x)) %>%
  select(t,a,b)

I added the as.integer function since you have different data types in your dataframes.

Wietze314
  • 5,942
  • 2
  • 21
  • 40
  • I think this drops the 2013 row – davsjob Nov 02 '18 at 15:05
  • When I try this, I get `Error in mutate_impl(.data, dots) : Evaluation error: Argument 2 must be type integer, not double.` - for some reason for me a.x is dbl too. Can be fixed by making sure both are `as.integer`. – iod Nov 02 '18 at 15:06
  • @davsjob nope it does not @iod that is odd. Maybe version differences? You can always leave the `as.integer` parts out, or force them both to integer. – Wietze314 Nov 02 '18 at 15:08
  • @Wietze314 my bad, think i misclicked in R – davsjob Nov 02 '18 at 15:15
  • @Wietze314 Thanks, this is nice. Is there a way to make the variable inside mutate (`a` or `b`) a variable that is set outside the statement? – N08 Nov 02 '18 at 15:16
  • 1
    @N08 sorry I don't understand what you mean exactly. If you want the calculated a or b names differently, you can adjust it in the code. – Wietze314 Nov 02 '18 at 15:19
  • @Wietze314 My question is, can I make the `a` in the following code a variable that I set outside the dplyr-statement? That would enable me to make this into a function to which I can simply pass the column name and dataframe: `prim %>% full_join(secnd, by = 't') %>% mutate(a = coalesce(as.integer(a.y),a.x)) %>% select(t,a,b)` – N08 Nov 02 '18 at 15:20
  • I see what you want, but it is not that easy in dplyr. Check out help function for `mutate_()` (instead of `mutate()`) – Wietze314 Nov 02 '18 at 15:59
1

Consider base R with a chain merge and ifelse calls, followed by final column cleanup:

final_df <- Reduce(function(x, y) merge(x, y, by="t", all=TRUE), list(prim, secnd, third))

final_df <- within(final_df, {
                      a.x <- ifelse(is.na(a.y), a.x, a.y)
                      b.x <- ifelse(is.na(b.y), b.x, b.y)      
                   })

final_df <- setNames(final_df[,1:3], c("t", "a", "b"))
final_df
#      t a  b
# 1 2007 1  7
# 2 2008 2  8
# 3 2009 3  9
# 4 2010 4 10
# 5 2011 5 11
# 6 2012 5 11
# 7 2013 7 13
Parfait
  • 104,375
  • 17
  • 94
  • 125
0

Not very pretty. But seems to do the job

prim %>% 
  anti_join(secnd, by = "t") %>% 
  full_join(secnd, by = c("t", "a")) %>% 
  select(-b) %>% 
  left_join(prim %>% 
      anti_join(third, by = "t") %>% 
      full_join(third, by = c("t", "b")) %>% 
      select(-a))

gives

     t a  b
1 2007 1  7
2 2008 2  8
3 2009 3  9
4 2010 4 10
5 2011 5 11
6 2012 5 11
7 2013 7 13
davsjob
  • 1,882
  • 15
  • 10