0

I have two tables as shown below.

The Transaction Table is a table representing a sales event between a Buyer and a Supplier. The ... in the table represents other columns such as TransactionDate, Store Information, etc.

The Contact Table is a table representing the Buyer and a Supplier's information such as name & home addresss etc.

Right now the two tables share a one-to-one relationship where BuyerID = ContactID and SellerID = ContactID.

I'm populating this table with a python script so the answer could be in python, SQL, or both.

Current State

Transaction Table

TransactionID (pk) ... BuyerID SellerID
11914 11914 Buyer 11914 Seller
11915 11915 Buyer 11915 Seller
11916 11916 Buyer 11916 Seller

Contact Table

ContactID (pk) ... FirstName LastName
11914 Buyer Mike Doe
11914 Seller Janet Myers
11915 Buyer Janet Myers
11915 Seller Mike Doe
11916 Buyer Dave Dirk
11916 Seller Janet Myers

What I'd like to do is remove the repetitive rows in the Contact Table and make the table relationships a one-to-many. Additionally if for example Janet Myers changes her last name, there would be be a new addition to the Contact Table despite all of the other columns staying the same. i.e. ignoring the ContactID column, there shouldn't be two identical rows in the ContactTable.

Future State

Transaction Table

TransactionID (pk) ... BuyerID SellerID
11914 06dfe636-4408-4abe-b902-b1ac6aab9849 58e5c2f7-d0ef-4129-a5a8-e17efaee0a1d
11915 58e5c2f7-d0ef-4129-a5a8-e17efaee0a1d 06dfe636-4408-4abe-b902-b1ac6aab9849
11916 ab99de1f-a714-4709-9a1e-f3a143b279d2 58e5c2f7-d0ef-4129-a5a8-e17efaee0a1d

Contact Table

ContactID (pk) ... FirstName LastName
06dfe636-4408-4abe-b902-b1ac6aab9849 Mike Doe
58e5c2f7-d0ef-4129-a5a8-e17efaee0a1d Janet Myers
ab99de1f-a714-4709-9a1e-f3a143b279d2 Dave Dirk

My first thought to solve this would be to create the ContactID out of a hash of all the other columns.

sha256(json.dumps(contactDict.encrypt('utf-8'))).hexdigest()

Then I'd use a MERGE statement to add the contact if it the ContactID hash didn't already exist in the table. I think this would work, however I'm having second thoughts risking a hash collision with financial data.

Other than a hash, my other thought would be a MERGE statement but with around 15 columns in the ON clause I thought that'd be very low performance.

Thom A
  • 88,727
  • 11
  • 45
  • 75
M. Stolte
  • 105
  • 1
  • 1
  • 11
  • is the last row of your Contact table being Janet a mistake/typo or an example of your data quality? (ie, is [number] expected to uniquely identify an individual and the suffix just calling out repetition?) – TZHX Aug 31 '21 at 14:24
  • Janet was involved with every transaction shown, either as a Buyer or Seller. That is why Janet appears multiple times in the Contacts Table. My Goal is to only have her in the Contacts Table once, and each transaction reference the same one Janet entry. – M. Stolte Aug 31 '21 at 14:43

0 Answers0