Convert ListBuffer of Dataframes into one single Dataframe Spark Scala

Question

I have a ListBuffer of 30 DataFrames with the same fields and I want to 'append' them all at once. What is the best way and most efficient?

var result_df_list = new ListBuffer[DataFrame]()

I have seen that you can create a Sequence of DF like this:

val newDFs = Seq(DF1,DF2,DF3)
newDFs.reduce(_ union _)

But how can you achieve this with a ListBuffer?

for ListBuffer the same as for Seq – pasha701 Sep 29 '19 at 15:19 — pasha701, Sep 29 '19 at 15:19
a `ListBuffer` IS a `Seq` – Raphael Roth Sep 29 '19 at 19:55 — Raphael Roth, Sep 29 '19 at 19:55

score 2 · Answer 1 · answered Sep 29 '19 at 17:16

The reduce method of ListBuffer works as expected.

Running

val unioned = result_df_list.reduce(_ union _)
unioned.explain()

results in a good looking execution plan:

== Physical Plan ==
Union
:- LocalTableScan [value#1]
:- LocalTableScan [value#5]
+- LocalTableScan [value#9]

score 2 · Answer 2 · answered Sep 29 '19 at 17:39

You can also use reduce() with ListBuffer.

  import spark.implicits._

  var result_df_list = new ListBuffer[DataFrame]()

  val df1 = Seq("1").toDF("value")
  val df2 = Seq("2").toDF("value")
  val df3 = Seq("3").toDF("value")

  result_df_list += df1
  result_df_list += df2
  result_df_list += df3

  val df_united: DataFrame = result_df_list.reduce(_ unionByName _)

  df_united.show()

Result:

+-----+
|value|
+-----+
|    1|
|    2|
|    3|
+-----+

score 0 · Answer 3 · answered Mar 27 '20 at 13:09

0

You can use MutableList And in mutable list toDF method can be used to convert the object into DataFrame or DataSet

answered Mar 27 '20 at 13:09

Mukul

1

Might help to post example usages of what you explained. – Andrew Nolan Mar 27 '20 at 13:38

score -1 · Answer 4 · answered Sep 29 '19 at 16:00

-1

You can try converting your list buffer to List by invoking toList method on List buffer and then you can use the reduce method.

answered Sep 29 '19 at 16:00

Hitesh

432
3
13

Convert ListBuffer of Dataframes into one single Dataframe Spark Scala

4 Answers4