Say I have two tables, one called InternalCustomers
which contains all my customers, and a table called ExternalCustomers
which contains customer information from an external source.
In my table I have a Legal_Id
for the customer which is an official id for the company, which also exists in the external table. However my table contains a mix of a lot of customers from various sources (and data quality is not perfect) and the Legal_Id
is therefore not necessarily unique.
Further my Customer_Name
is not the legally registered name for the Company, but has just been typed in by an account manager, so it may read something different than the official name (e.g. may read Google Incorporated
rather than Google Inc.
)
My challenge is to join these two tables together with some sort of Pattern-matching, but I don't know if this is even possible?
InternalCustomers (17 mio different customers):
| Legal_Id | Customer_Name |
|----------|------------------------|
| 1234 | Chr.Hansen Group |
| 1235 | Apple Inc |
| 1236 | A.P. Moller Maersk A/S |
ExternalCustomers (no control):
| Legal_Id | Customer_Name |
|----------|------------------------|
| 1234 | Christian Hansen AS |
| 1235 | Nokia |
| 1236 | Mærsk |
Expected result when doing a left join
:
| Legal_Id | Customer_Name | Legal_Id (External) | Customer_Name (External) |
|----------|------------------------|---------------------|--------------------------|
| 1234 | Chr.Hansen Group | 1234 | Christian Hansen AS |
| 1235 | Apple Inc | NULL | NULL |
| 1236 | A.P. Moller Maersk A/S | 1236 | Mærsk |
So to summarize: I want to do a left join
between the two tables on Legal_Id
and a pattern-match on Customer_Name
if possible.
Apple <> Nokia
, therefore this join should not be completed - but A.P. Moller Maersk A/S = Mærsk
and this join should be completed.
Customer names are not always easily identifiable so that I could just match a partial name - e.g. A.P Moller Maersk A/S
would be fairly easy for a human to match up to Mærsk
, but a machine will say that Maersk
<> Mærsk
.
Is this in any way achievable?