I'm working in scala with a dataframe, but the dataframe has ~60 columns.
In a Databricks pipeline, we've split a few columns out along with an identity column to validate some data, resulting in a 'reference' dataframe. I'd like to join it back to the main, large dataframe, and insert the validated data into the original column.
To keep things simple, I'd like the resultant dataframe to match the schema of the original, so none of the reference columns.
On a small scale, this isn't too hard:
myDF = myDF
.join(refDF,
myDF("Identity") === refDF("RefIdentity"),
"inner")
.withColumn("Foo", $"refFoo")
.select("Identity","Foo","Column2","Column3"...)
This turns into a huge pain when dealing with large numbers of columns. Is there a quicker way to select only the columns from myDF after the withColumn operation?