I have a dataframe to which i do concatenation to its all fields.
After concatenation it becomes another dataframe and finally I write its output to csv file with partitioned on two of its columns. One of its column is present in first dataframe which I do not want to include in the final output.
Here is my code:
val dfMainOutput = df1resultFinal.join(latestForEachKey, Seq("LineItem_organizationId", "LineItem_lineItemId"), "outer")
.select($"LineItem_organizationId", $"LineItem_lineItemId",
when($"DataPartition_1".isNotNull, $"DataPartition_1").otherwise($"DataPartition".cast(DataTypes.StringType)).as("DataPartition"),
when($"StatementTypeCode_1".isNotNull, $"StatementTypeCode_1").otherwise($"StatementTypeCode").as("StatementTypeCode"),
when($"FFAction_1".isNotNull, concat(col("FFAction_1"), lit("|!|"))).otherwise(concat(col("FFAction"), lit("|!|"))).as("FFAction"))
.filter(!$"FFAction".contains("D"))
Here I am concatenating and creating another dataframe:
val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.map(c => col(c)): _*).as("concatenated"))
This is what i have tried
dfMainOutputFinal
.drop("DataPartition")
.write
.partitionBy("DataPartition","StatementTypeCode")
.format("csv")
.option("header","true")
.option("encoding", "\ufeff")
.option("codec", "gzip")
.save("path to csv")
Now i dont want DataPartition column in my output .
I am doing partition based on DataPartition so i am not getting but because DataPartition is present in the main data frame I am getting it in the output.
QUESTION 1: How can ignore a columns from Dataframe
QUESTION 2: Is there any way to add "\ufeff"
in the csv output file before writing my actual data so that my encoding format will become UTF-8-BOM.
As per the suggested answer
This is what i have tried
val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.filter(_ != "DataPartition").fieldNames.map(c => col(c)): _*).as("concatenated"))
But getting below error
<console>:238: error: value fieldNames is not a member of Seq[org.apache.spark.sql.types.StructField]
val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.filter(_ != "DataPartition").fieldNames.map(c => col(c)): _*).as("concatenated"))
Below is the question if i have to remove two columns in final output
val dfMainOutputFinal = dfMainOutput.select($"DataPartition","PartitionYear",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition","PartitionYear").map(c => col(c)): _*).as("concatenated"))