-1

I have 2 dataframes that are created using StructType method in Spark. They both have unequal number of columns. Need to unionall them. Kindly assist.

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
Dasarathy D R
  • 335
  • 2
  • 7
  • 20

1 Answers1

0

Its NOT Possible with Spark Dataframes un less you add dummy columns

DataFrame UninonAll is just like your SQL union all in which you need to have same number of columns and same datatypes...

union all basic requirement is types, order should be same in either RDBMS sql or DataFrames.

which means that they return the same number of columns and the corresponding columns have compatible data types

So you can create dummy columns of the same name/type to align with union requirements.

unionAll public DataFrame unionAll(DataFrame other)

Returns a new DataFrame containing union of rows in this frame and another frame. This is equivalent to UNION ALL in SQL.

Parameters:

other - (undocumented)

Returns:

(undocumented)

Since:

1.3.0


SQL Examples :

CASE 1:

** Possible : where a(int datatype),b(int datatype),c(int datatype) & x(int datatype), y(int datatype),z(int datatype) are same data type **

select a, b, c from table1 
unionall
select x,y,z from table2 

CASE 2:

** NOT Possible : where a(int type),b(int type),c(int type) & p(int type),q(int type),r(int type), x(String type), y(int type),z(String type) **

select a, b, c from table1 
 unionall
select p, q,r, x,y,z from table2 

CASE 3:

To make it possible you add dummy columns to table 1 x(String type), y(int type),z(String type)

In this case I added dummy columns "dasarathy" as x, 2 as y, "dr" as z

 select a, b, c, "dasarathy" as x, 2 as y, "dr" as z from table1 
     unionall
    select p, q,r, x,y,z from table2 

Same is true in case of dataframes as well.

Conclusion : If its absolutely needed, you can add dummy columns (using withColumn) to dataframe to make dataframe1 unionall dataframe2

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121