I'm building out the data warehouse at my company and I've encountered a situation where I am pulling in data with slight variations in name but tied to the same ID. This is obviously a problem because my dimension table should only have one record per ID
for example:
+======+===================+
| id | name |
+======+===================+
| 185 | AAAA |
+------+-------------------+
| 185 | AAAB |
+------+-------------------+
| 197 | XXXA |
+------+-------------------+
| 197 | XXXB |
+------+-------------------+
| 197 | XXXC |
+------+-------------------+
As you can see, the ID field should be tied to one unique value but there are strings that have slight variations but tied to the same ID. One thought was to normalize the strings but we would lose some of the metadata. Additionally, I should note that we are using Redshift which is why the unique id constraint is not being enforced. What would be the best solution to this issue?