8

I input 5 text data sets into R using read.table. Each data set has the same structure (100 rows, 50 cols). I would like to union\append all the five tables together into one table, which will be 500 rows * 50 cols. Anyone know how to do that?

eli-k
  • 10,898
  • 11
  • 40
  • 44
qqqwww
  • 521
  • 1
  • 10
  • 19
  • 2
    Fyi, that is not what union means. – Frank Apr 26 '16 at 14:34
  • I have try union, but it become a large list, no longer a table. – qqqwww Apr 26 '16 at 14:39
  • 4
    I cast a vote to reopen this question as I don't think this is too broad, as the OP has accepted an answer that makes it clear what they mean by `union`, that is, an SQL `union`. FWIW, I was looking up `how to do a union of sql tables from dplyr` and google pointed me to this question as the first result. – Alex Apr 27 '18 at 01:54

3 Answers3

16

From package dplyr:

install.packages('dplyr')
library(dplyr)
new_df <- bind_rows(table1, table2, table3, table4, table5)
Raphael K
  • 2,265
  • 1
  • 16
  • 23
  • What's wrong with this? – Raphael K Apr 26 '16 at 14:35
  • I don't see anything wrong either. Actually `bind_rows` is much more efficient than `rbind`. – Psidom Apr 26 '16 at 14:41
  • I also do not see a problem, so upvote from here. Maybe someone thinks it *overkill* to use a package for such a simple problem. While it is faster in general (as @Psidom points out) the data size is so small that it doesn't matter in practice here (probably). – Anders Ellern Bilgrau Apr 26 '16 at 14:46
  • 1
    But dplyr is so awesome.... :) – Raphael K Apr 26 '16 at 14:47
  • AEBilgrau's way works. But I also try your way. I install the package and use bind_rows to bind tables. But the it has Error: could not find function "bind_rows". Then I use dplyr::bind_rows, then there is another error: Warning messages: 1: In rbind_all(x, .id) : Unequal factor levels: coercing to character. In the final dataset, parts of rows are missing. – qqqwww Apr 26 '16 at 14:54
  • You have to install the package first. I will edit the code above to include that. I am unaware of what would be causing that error, however. – Raphael K Apr 26 '16 at 16:27
7

In base R, you can do the following:

# Create some toy data first
nc <- 50
nr <- 1000

# Create five tables with nc columns and nr rows.
df1 <- as.data.frame(replicate(nc, rnorm(nr)))
df2 <- as.data.frame(replicate(nc, rnorm(nr)))
df3 <- as.data.frame(replicate(nc, rnorm(nr)))
df4 <- as.data.frame(replicate(nc, rnorm(nr)))
df5 <- as.data.frame(replicate(nc, rnorm(nr)))

# Join the tables
df <- rbind(df1, df2, df3, df4, df5)
dim(df)
#[1] 5000   50

This gives you the 5 tables stacked upon each other if that's what you're looking for. If not, you should provide at minimal example illustrating your problem.

Anders Ellern Bilgrau
  • 9,928
  • 1
  • 30
  • 37
7

Even though it does not matter for this specific question, it might be still helpful to have a comparison of different rbind methods. Here is a comparison of three rbind methods from base, data.table and dplyr;

> dim(df)
[1] 16777216        2

> microbenchmark(rbind(df,df), rbindlist(list(df,df)), bind_rows(df,df), times = 10)
Unit: milliseconds
                    expr       min        lq      mean    median        uq       max neval cld
           rbind(df, df) 3824.4208 4052.6405 4288.5569 4239.2416 4557.5736 4685.2155    10   c
 rbindlist(list(df, df))  272.5048  304.8365  348.0393  357.4388  390.7684  405.0778    10 a  
       bind_rows(df, df)  571.1732  596.2556  715.1572  643.8038  863.5805  927.0341    10  b 
Psidom
  • 209,562
  • 33
  • 339
  • 356