I need to write a small ETL pipeline because I need to move some data from a source database to a target database (a datawarehouse) to perform some analysis on data.
Among those data, I need to clean and conform the name of cities. Cities are inserted manually by international users, conseguently for a single city I can have multiple names (for example London or Londra). In my source database I do not have only big cities but I have also small villages.
Well, if I do not standardize city names, our analysis could be nonsensical.
Which is the best practices to standardize cities in my target database? Have any idea or suggestion I can undertake?
Thank you