Binding dataframes of different length (no cbind, no merge)

Question

I am trying to display multiple dataframes next to each other to compare certain entries. However, they have a different number of rows and I want each data frame to be in the exact same order. I tried to use cbind which did not work because of the different number of rows. I used merge to bind two dfs together and then merge them again, however they change order when I do that and it seems inefficient to merge two dfs when I have more than 5 in total.

Examp:

df <-  data.frame(v=1:5, x=sample(LETTERS[1:5],5))
df 
  v x
1 1 E
2 2 B
3 3 D
4 4 C
5 5 A

df2 <- data.frame(m=7:10, n=sample(LETTERS[6:9],4))
df2
   m n
1  7 G
2  8 I
3  9 F
4 10 H

Then I ordered df2

df2 <- df2[order(df2$m, decreasing = TRUE),]
df2
   m n
4 10 F
3  9 I
2  8 H
1  7 G

Expected output:

  v x m n
1 1 E 10 F
2 2 B 9 I
3 3 D 8 H
4 4 C 7 G
5 5 A NA NA

As I said, I have more than two dfs and the order of the dfs should be remained. Any help will be greatly appreciated!

score 5 · Answer 1 · answered Apr 22 '21 at 06:45

Base R approach :

Put the dataframes in a list, get the dataframe with maximum number of rows, append NA's to data which have less number of rows and cbind.

list_df <- list(df, df2)
n_r <- seq_len(max(sapply(list_df, nrow)))
result <- do.call(cbind, lapply(list_df, `[`, n_r, ))
result

#  v x  m    n
#1 1 C 10    F
#2 2 B  9    H
#3 3 E  8    G
#4 4 D  7    I
#5 5 A NA <NA>

AnilGoyal · Accepted Answer · 2021-04-22T06:55:25.207

Edit: In case there are multiple df. Do this

Create a list of all dfs except one say first one
use purrr::reduce to join all these together
pass first df in .init argument.

df2 <- data.frame(m=7:10, n=sample(LETTERS[6:9],4))
df <-  data.frame(v=1:5, x=sample(LETTERS[1:5],5))
df3 <- data.frame(bb = 101:110, cc = sample(letters, 10))


reduce(list(df2, df3), .init = df %>% mutate(id = row_number()) , ~full_join(.x, .y %>% mutate(id = row_number()), by = "id" )) %>%
  select(-id)

    v    x  m    n  bb cc
1   1    A 10    I 101  u
2   2    C  9    H 102  v
3   3    D  8    G 103  n
4   4    E  7    F 104  w
5   5    B NA <NA> 105  s
6  NA <NA> NA <NA> 106  y
7  NA <NA> NA <NA> 107  g
8  NA <NA> NA <NA> 108  i
9  NA <NA> NA <NA> 109  p
10 NA <NA> NA <NA> 110  h

Earlier Answer: Create a dummy column id in both dfs and use full_join

full_join(df %>% mutate(id = row_number()), df2 %>% mutate(id = row_number()), by = "id") %>%
  select(-id)

  v x  m    n
1 1 A 10    I
2 2 C  9    H
3 3 D  8    G
4 4 E  7    F
5 5 B NA <NA>

Results are different from as expected becuase of different random number seed

Or in BaseR

merge(transform(df, id = seq_len(nrow(df))), transform(df2, id = seq_len(nrow(df2))), all = T)

  id v x  m    n
1  1 1 A 10    I
2  2 2 C  9    H
3  3 3 D  8    G
4  4 4 E  7    F
5  5 5 B NA <NA>

Remove extra column simply by subsetting []

merge(transform(df, id = seq_len(nrow(df))), transform(df2, id = seq_len(nrow(df2))), all = T)[-1]

  v x  m    n
1 1 A 10    I
2 2 C  9    H
3 3 D  8    G
4 4 E  7    F
5 5 B NA <NA>

I tried that already, however I still have the problem that I have multiple data frames that I want to put next to each other. And merge does not work as soon as I put more than two data frames in the command. Merging two by a time seems very inefficient. I just wondered whether there is a quicker way to do it — Linda Espey, Apr 22 '21 at 06:46
See edited answer. If there is a column named `id` in any of the dataframes to be joined, use some another name say `dummy` which is not present in any of the dataframes. — AnilGoyal, Apr 22 '21 at 06:56

score 0 · Answer 3 · answered Apr 22 '21 at 06:27

0

library(plyr)
combined <- rbind.fill(df[c("v", "x")], df2[c("m", "n")])

Is that what you want?

answered Apr 22 '21 at 06:27

Arduan

253
1
11

score 0 · Answer 4 · answered Apr 22 '21 at 06:49

Another base R approach, but with merge, you need to:

add the sort parameter to make sure the result doesn't get sorted
remove the row names from the data frames
add all parameter to make sure all rows are used,
the [-1] is to get rid of the row names column added by merge

Example:

set.seed(123)
df1 <-  data.frame(v = 1:5, 
                   x = sample(LETTERS[1:5], 5))
df1 
#>   v x
#> 1 1 A
#> 2 2 B
#> 3 3 D
#> 4 4 C
#> 5 5 E

df2 <- data.frame(m = 7:10, 
                  n = sample(LETTERS[6:9], 4))
df2
#>    m n
#> 1  7 G
#> 2  8 H
#> 3  9 I
#> 4 10 F

df2 <- df2[order(df2$m, decreasing = TRUE),]
df2
#>    m n
#> 4 10 F
#> 3  9 I
#> 2  8 H
#> 1  7 G

merge(data.frame(df1, row.names = NULL),
      data.frame(df2, row.names = NULL),
      by = 0,
      all = TRUE,
      sort = FALSE)[-1]
#>   v x  m    n
#> 1 1 A 10    F
#> 2 2 B  9    I
#> 3 3 D  8    H
#> 4 4 C  7    G
#> 5 5 E NA <NA>

If you need more then 2 dataframes you can use Reduce.

df3 <-  data.frame(a = 1:7, 
                   z = sample(LETTERS[1:7], 7))

Reduce(function(x,y) merge(x = x, y = y, by = 0, all = TRUE, sort = FALSE)[-1], 
       list(data.frame(df1, row.names = NULL), 
            data.frame(df2, row.names = NULL),
            data.frame(df3, row.names = NULL)))
#>    v    x  m    n a z
#> 1  1    C 10    I 1 F
#> 2  2    B  9    F 2 G
#> 3  3    E  8    H 3 A
#> 4  4    D  7    G 4 B
#> 5  5    A NA <NA> 5 C
#> 6 NA <NA> NA <NA> 6 D
#> 7 NA <NA> NA <NA> 7 E


Created on 2021-04-22 by the reprex package (v2.0.0)

Binding dataframes of different length (no cbind, no merge)

4 Answers4

Linked