1

I have a ListBuffer of 30 DataFrames with the same fields and I want to 'append' them all at once. What is the best way and most efficient?

var result_df_list = new ListBuffer[DataFrame]()

I have seen that you can create a Sequence of DF like this:

val newDFs = Seq(DF1,DF2,DF3)
newDFs.reduce(_ union _)

But how can you achieve this with a ListBuffer?

4 Answers4

2

The reduce method of ListBuffer works as expected.

Running

val unioned = result_df_list.reduce(_ union _)
unioned.explain()

results in a good looking execution plan:

== Physical Plan ==
Union
:- LocalTableScan [value#1]
:- LocalTableScan [value#5]
+- LocalTableScan [value#9]
werner
  • 13,518
  • 6
  • 30
  • 45
2

You can also use reduce() with ListBuffer.

  import spark.implicits._

  var result_df_list = new ListBuffer[DataFrame]()

  val df1 = Seq("1").toDF("value")
  val df2 = Seq("2").toDF("value")
  val df3 = Seq("3").toDF("value")

  result_df_list += df1
  result_df_list += df2
  result_df_list += df3

  val df_united: DataFrame = result_df_list.reduce(_ unionByName _)

  df_united.show()

Result:

+-----+
|value|
+-----+
|    1|
|    2|
|    3|
+-----+
Aleh Pranovich
  • 361
  • 1
  • 7
0

You can use MutableList And in mutable list toDF method can be used to convert the object into DataFrame or DataSet

Mukul
  • 1
-1

You can try converting your list buffer to List by invoking toList method on List buffer and then you can use the reduce method.

Hitesh
  • 432
  • 3
  • 13