I wrote some code to combine multiple CSV's that are interpretered with Pandas and appended to one combined CSV.
The issue I have is that the CSV files are delivered by multiple parties (monthly) and often contain differences with regard to column names, while they essentially contain the same information. For instance:
CSV 1 | ID | Instance number | | -------- | -------------- | | 1 | 401421 | | 2 | 420138 |
CSV 2 | ID | Instance NO | | -------- | -------------- | | 1 | 482012 | | 2 | 465921 |
This will result in two columns in the combined file, Instance Number & Instance NO unless I rename the column beforehand while the idea is to automatically process all files without intervention beforehand.
The solution that should work is to use combine_first or fillna, but next time the column may be entered as e.g. Instance No/number.
Since improving data delivery isn't an option, is there any smart way to solve issues like this without having to write out all possible variations and remap them to one leading column?
Thanks in advance!