This is my code for the union:
val dfToSave=dfMainOutput.union(insertdf.select(dfMainOutput).withColumn("FFAction", when($"FFAction" === "O" || $"FFAction" === "I", lit("I|!|")))
When I do union I get below error:
org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. string <> boolean at the 11th column of the second table;;
'Union
Here is the schema of two dataframes:
insertdf.printSchema()
root
|-- OrganizationID: long (nullable = true)
|-- SourceID: integer (nullable = true)
|-- AuditorID: integer (nullable = true)
|-- AuditorOpinionCode: string (nullable = true)
|-- AuditorOpinionOnInternalControlCode: string (nullable = true)
|-- AuditorOpinionOnGoingConcernCode: string (nullable = true)
|-- IsPlayingAuditorRole: boolean (nullable = true)
|-- IsPlayingTaxAdvisorRole: boolean (nullable = true)
|-- AuditorEnumerationId: integer (nullable = true)
|-- AuditorOpinionId: integer (nullable = true)
|-- AuditorOpinionOnInternalControlsId: string (nullable = true)
|-- AuditorOpinionOnGoingConcernId: string (nullable = true)
|-- IsPlayingCSRAuditorRole: boolean (nullable = true)
|-- FFAction: string (nullable = true)
|-- DataPartition: string (nullable = true)
Here is the schema of second dataframe:
dfMainOutput.printSchema()
root
|-- OrganizationID: long (nullable = true)
|-- SourceID: integer (nullable = true)
|-- AuditorID: integer (nullable = true)
|-- AuditorOpinionCode: string (nullable = true)
|-- AuditorOpinionOnInternalControlCode: string (nullable = true)
|-- AuditorOpinionOnGoingConcernCode: string (nullable = true)
|-- IsPlayingAuditorRole: boolean (nullable = true)
|-- IsPlayingTaxAdvisorRole: boolean (nullable = true)
|-- AuditorEnumerationId: integer (nullable = true)
|-- AuditorOpinionId: integer (nullable = true)
|-- AuditorOpinionOnInternalControlsId: integer (nullable = true)
|-- AuditorOpinionOnGoingConcernId: boolean (nullable = true)
|-- IsPlayingCSRAuditorRole: string (nullable = true)
|-- FFAction: string (nullable = true)
|-- DataPartition: string (nullable = true)
To avoid this problem I might have to write a select
for each columns.
So is there any Scala syntax that manage to type caste or make both dataframes to same type?
This is what I have tried so far but still getting the same error:
val columns = dfMainOutput.columns.toSet.intersect(insertdf.columns.toSet).map(col).toSeq
//Perform Union
val dfToSave=dfMainOutput.select(columns: _*).union(insertdf.select(columns: _*)).withColumn("FFAction", when($"FFAction" === "O" || $"FFAction" === "I", lit("I|!|")))