I have a CSV source file with this schema defined.
["Name", "Address", "TaxId", "SS Number", "Mobile Number", "Gender", "LastVisited"]
From this CSV, these are the operations I need to do:
Select a subset of columns, one at a time, and map all of them to this fixed schema:
["Name", "Address", "Mobile", "UniqueID", "UniqueIdentifierRefCode"]
So, for example, In the first iteration, I will be selecting only a subset of the columns:
[Col("Name"), Col("Address"), Col("Mobile Number"), Col("TaxId"), Lit("TaxIdentifier")]
In the next iteration I need to select a different subset, but map them to the same fixed schema:
[Col("Name"), Col("Address"), Col("Mobile Number"), Col("SS Number"), Lit("SocialSecurityNumber")]
I can do all of this by running a for loop, selecting out the columns, and doing a UnionAll in the end. But is there a better way to let Spark handle this?