I'm looking to create a dynamic .withColumn.
with the column "rules" being replaced by a list depending on the file being processed.
for example: File A has a column called "Validated" that is based on a different condition to File B but has the same column name A. So can we loop through all files A-Z applying different rules for the same column in each file?
Here I am trying to validate many dataframes. Creating an EmailAddress_Validation field on each dataframe. Each data frame has a different email validation rule set. The rules are stored in a list called EmailRuleList. As we loop through each data set the corresponding rule "EmailRuleList[i]" is passed in from the list.
code below has the syntax. Also commented out with an "#" (hash) is an example of a rule. Interestingly if I supply the rule with out the loop (the # comment) the code works except it then obviously applies the same rule to all files.
i=0
for FileProcessName in FileProcessListName:
EmailAddress_Validation = EmailRuleList[i]
#EmailAddress_Validation = when((regexp_extract(col("EmailAddress"),EmailRegEx,0))==(col("EmailAddress")),0).otherwise(1)
print(EmailAddress_Validation)
print(FileProcessName)
i=i+1
vars()[FileProcessName] = vars()[FileProcessName].withColumn("EmailAddress_Validation", EmailAddress_Validation)
Error Message: col should be Column
EmailRuleList is something like...
['when((regexp_extract(col("EmailAddress"),EmailRegEx,0))==(col("EmailAddress")),1).otherwise(0)',
'when((regexp_extract(col("EmailAddress"),EmailRegEx2,0))==(col("EmailAddress")),0).otherwise(1)',
'when((regexp_extract(col("EmailAddress"),EmailRegEx3,0))==(col("EmailAddress")),0).otherwise(1)',
'when((regexp_extract(col("EmailAddress"),EmailRegEx4,0))==(col("EmailAddress")),0).otherwise(1)']
tried lots of different things but am a bit stuck