Merge and fill different length data in R

Question

I'm using R and need merge data with different lenghts

Following this dataset

> means2012
 # A tibble: 232 x 2
   exporter    eci
   <fct>     <dbl>
 1 ABW       0.235
 2 AFG      -0.850
 3 AGO      -1.40 
 4 AIA       1.34 
 5 ALB      -0.480
 6 AND       1.22 
 7 ANS       0.662
 8 ARE       0.289
 9 ARG       0.176
 10 ARM       0.490
 # ... with 222 more rows

> means2013
 # A tibble: 234 x 2
    exporter     eci
    <fct>      <dbl>
  1 ABW       0.534 
  2 AFG      -0.834 
  3 AGO      -1.26  
  4 AIA       1.47  
  5 ALB      -0.498 
  6 AND       1.13  
  7 ANS       0.616 
  8 ARE       0.267 
  9 ARG       0.127 
 10 ARM       0.0616
 # ... with 224 more rows


> str(means2012)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   232 obs. of  2 variables:
 $ exporter: Factor w/ 242 levels "ABW","AFG","AGO",..: 1 2 3 4 5 6 7 9 10 11 ...
 $ eci     : num  0.235 -0.85 -1.404 1.337 -0.48 ...
> str(means2013)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   234 obs. of  2 variables:
 $ exporter: Factor w/ 242 levels "ABW","AFG","AGO",..: 1 2 3 4 5 6 7 9 10 11 ...
 $ eci     : num  0.534 -0.834 -1.263 1.471 -0.498 ...

Note that 2 tibble has different lenghts. "Exporter" are countries.

Is there any way to merge both tibble, looking to the factors (Exporter) and fill the missing it with "na"?

It doesn't matter if is a tibble, dataframe, or other kind.

like this:

tibble 1
a 5
b 10
c 15
d 25

tibble 2
a 7
c 23
d 20

merged one:
a 5  7 
b 10 na
c 15 23
d 25 20

MKR, got same error using the other dplyr solution: **Error: `by` must be a (named) character vector, list, or NULL for natural joins (not recommended in production code), not list** — Adilson V Casula, Feb 17 '18 at 19:44
You should provide `by = exporter`. May be you should use `full_join` if you want to including missing values from both tibbles. — MKR, Feb 17 '18 at 19:50

score 1 · Accepted Answer · answered Feb 17 '18 at 19:36

1

using merge with parameter all set to TRUE:

tibble1 <- read.table(text="
x y
a 5
b 10
c 15
d 25",header=TRUE,stringsAsFactors=FALSE)

tibble2 <- read.table(text="
x z
a 7
c 23
d 20",header=TRUE,stringsAsFactors=FALSE)


merge(tibble1,tibble2,all=TRUE)

  x  y  z
1 a  5  7
2 b 10 NA
3 c 15 23
4 d 25 20

Or dplyr::full_join(tibble1,tibble2) for the same effect

answered Feb 17 '18 at 19:36

moodymudskipper

46,417
11
121
167

First option returned: **Error in fix.by(by.x, x) : 'by' must specify one or more columns as numbers, names or logical**. And the second one returned **"Error: `by` must be a (named) character vector, list, or NULL for natural joins (not recommended in production code), not list"** – Adilson V Casula Feb 17 '18 at 19:43
Using the data from my answer ? – moodymudskipper Feb 17 '18 at 20:57
No, using my data. – Adilson V Casula Feb 18 '18 at 02:22
If a solution works on your reproducible example but not on your real data, you should update your reproducible example to reflect the issue. – moodymudskipper Feb 18 '18 at 09:29
Found the error. Merge worked perfect. The simple error here was trying to merge more than 2 tibble at same time. Doing it over a loop, 2 by 2 it worked fine. Thanks Moody – Adilson V Casula Feb 18 '18 at 17:06
you're welcome, look up `Reduce` if you want to merge several tables in one operation – moodymudskipper Feb 18 '18 at 17:08
Ok. I will do =) – Adilson V Casula Feb 18 '18 at 17:10

score 0 · Answer 2 · answered Feb 17 '18 at 19:47

You could rename the colums to join them, and get NA where the other value is missing.

library(tidyverse)

means2012 %>% 
  rename(eci2012 = eci) %>% 
  full_join(means2013 %>% 
              rename(eci2013 = eci))

But a tidier approach would be to add a year column, keep the column eci as is and just bind the rows together.

means2012 %>% 
  mutate(year = 2012) %>% 
  bind_rows(means2013 %>% 
              mutate(year = 2013))

Merge and fill different length data in R

2 Answers2