I input 5 text data sets into R using read.table
.
Each data set has the same structure (100 rows, 50 cols).
I would like to union\append all the five tables together into one table, which will be 500 rows * 50 cols.
Anyone know how to do that?
Asked
Active
Viewed 2.5k times
8
-
2Fyi, that is not what union means. – Frank Apr 26 '16 at 14:34
-
I have try union, but it become a large list, no longer a table. – qqqwww Apr 26 '16 at 14:39
-
4I cast a vote to reopen this question as I don't think this is too broad, as the OP has accepted an answer that makes it clear what they mean by `union`, that is, an SQL `union`. FWIW, I was looking up `how to do a union of sql tables from dplyr` and google pointed me to this question as the first result. – Alex Apr 27 '18 at 01:54
3 Answers
16
From package dplyr
:
install.packages('dplyr')
library(dplyr)
new_df <- bind_rows(table1, table2, table3, table4, table5)

Raphael K
- 2,265
- 1
- 16
- 23
-
-
I don't see anything wrong either. Actually `bind_rows` is much more efficient than `rbind`. – Psidom Apr 26 '16 at 14:41
-
I also do not see a problem, so upvote from here. Maybe someone thinks it *overkill* to use a package for such a simple problem. While it is faster in general (as @Psidom points out) the data size is so small that it doesn't matter in practice here (probably). – Anders Ellern Bilgrau Apr 26 '16 at 14:46
-
1
-
AEBilgrau's way works. But I also try your way. I install the package and use bind_rows to bind tables. But the it has Error: could not find function "bind_rows". Then I use dplyr::bind_rows, then there is another error: Warning messages: 1: In rbind_all(x, .id) : Unequal factor levels: coercing to character. In the final dataset, parts of rows are missing. – qqqwww Apr 26 '16 at 14:54
-
You have to install the package first. I will edit the code above to include that. I am unaware of what would be causing that error, however. – Raphael K Apr 26 '16 at 16:27
7
In base R, you can do the following:
# Create some toy data first
nc <- 50
nr <- 1000
# Create five tables with nc columns and nr rows.
df1 <- as.data.frame(replicate(nc, rnorm(nr)))
df2 <- as.data.frame(replicate(nc, rnorm(nr)))
df3 <- as.data.frame(replicate(nc, rnorm(nr)))
df4 <- as.data.frame(replicate(nc, rnorm(nr)))
df5 <- as.data.frame(replicate(nc, rnorm(nr)))
# Join the tables
df <- rbind(df1, df2, df3, df4, df5)
dim(df)
#[1] 5000 50
This gives you the 5 tables stacked upon each other if that's what you're looking for. If not, you should provide at minimal example illustrating your problem.

Anders Ellern Bilgrau
- 9,928
- 1
- 30
- 37
7
Even though it does not matter for this specific question, it might be still helpful to have a comparison of different rbind methods. Here is a comparison of three rbind
methods from base
, data.table
and dplyr
;
> dim(df)
[1] 16777216 2
> microbenchmark(rbind(df,df), rbindlist(list(df,df)), bind_rows(df,df), times = 10)
Unit: milliseconds
expr min lq mean median uq max neval cld
rbind(df, df) 3824.4208 4052.6405 4288.5569 4239.2416 4557.5736 4685.2155 10 c
rbindlist(list(df, df)) 272.5048 304.8365 348.0393 357.4388 390.7684 405.0778 10 a
bind_rows(df, df) 571.1732 596.2556 715.1572 643.8038 863.5805 927.0341 10 b

Psidom
- 209,562
- 33
- 339
- 356