I seek to standardize header names of my DataFrames given a reference table.
My reference table is a DataFrame with variables in rows, and the standard and all possible variant names as columns :
+-------------+---------+---------+
|Standard_name|Variant_1|Variant_2|
+-------------+---------+---------+
| Pressure| Press| Press_1|
| Speed| Speed_| Rate|
+-------------+---------+---------+
Say I have a DataFrame of data with these column names :
['Pressure', 'Rate', 'Altitude']
I want to look for each of these variable names in my reference DataFrame, return the corresponding Standard_name if it exist or keep the original variable if it is not yet referenced in the table.
Thus, the expected outcome of the dummy example above should be :
[Pressure, 'Speed', Altitude]
This is easy to do in regular Python Pandas, but I have no idea how to do that in Spark where you're not supposed to think in terms of row indices.
Many thanks in advance for the help.