Imagine I have a dirty dataframe of employees with their ID, and Contract related information per country.
Some columns of this dataframe are LOV columns (depending on the country, some columns are LOV for just one country, others some or all of them) and some LOV columns are mandatory and some are not (that is just used to understand if a blank value is accepted or not).
We would need to check, using another mapping dataframe:
- if the values provided exist in the mapping dataframe and,
- if so, replace the value provided with the corresponding code on the dataframe.
If the value provided is not on the list, create a new column on the main dataframe named "Errors" where it says the name of the column it errored (if more than 1 column, maybe save the name in a list on that column).
So from this dataframe:
ID Country Contract Type
1 CZ Permanent BOFF
1 ES Fixed-term .
2 CZ Contractor Front-Office
3 PT Permanent
4 PT 2022-01-01 Employee
4 PT Fixed-term Office
4 ES Employee
5 SK Permanent Employee
And using this mapping:
Country Field Values Code Mandadory
CZ Contract Permanent PE Yes
CZ Contract Fixed-term FX Yes
CZ Contract Contractor CT Yes
ES Contract Permanent PERMA No
SK Contract Permanent PER-01 Yes
SK Contract Fixed-term FIX-01 Yes
ES Type Office OFF Yes
CZ Type Back-Office BOFF Yes
CZ Type Front-Office FOFF Yes
PT Type Employee EMP No
PT Type Front-Office FRONT No
Would result in this dataframe:
ID Country Contract Type Errors
1 CZ PE BOFF ['Type']
1 ES Fixed-term . ['Contract','Type']
2 CZ CT FOFF
3 PT Permanent
4 PT 2022-01-01 FRONT ['Type']
4 PT Fixed-term Office ['Type']
4 ES Employee ['Contract','Type']
5 SK PER-01 Employee
Thank you so much for the support!