0

I need to import and parse some sales records into the database.
Some customers have more than one product, therefore there are several orders with different product but with the same shipping address in the JSON Array.

Problem arise when I try to save the address in Addresses Table:
Clearly I need just one record for each Address, and to achieve this I calculate an Hash of the address fields and I compare it with the Hashes already present in the table:

It seems (well, I'm sure) that the query checks only the first time if the Hash is already present, and if it is not present, it add as many rows as the orders count:

I can imagine this is the standard behavior of OPENJSON function since the parsing of the JSON payload seems to be made by an inner loop, therefore I think I have to use a different approach.... but I have no Idea what to use.

here the JSON payload

declare @json nvarchar(max)=N'[
   {
      "id": 21660,
      "currency": "USD",
      "total": "15.00",
      "shipping": {
         "first_name": "Charles",
         "last_name": "Leuschke",
         "address_1": "3121 W Olive Ave",
         "city": "Burbank",
         "state": "CA",
         "postcode": "91505",
         "country": "US"
      },
      "line_items": [
         {
            "id": 1052
         }
      ]
   },
   {
      "id": 21659,
      "currency": "USD",
      "total": "38.00",
      "shipping": {
         "first_name": "Charles",
         "last_name": "Leuschke",
         "address_1": "3121 W Olive Ave",
         "city": "Burbank",
         "state": "CA",
         "postcode": "91505",
         "country": "US"
      },
      "line_items": [
         {
            "id": 1050
         }
      ]
   },
   {
      "id": 21658,
      "currency": "USD",
      "total": "38.00",
      "shipping": {
         "first_name": "Charles",
         "last_name": "Leuschke",
         "address_1": "3121 W Olive Ave",
         "city": "Burbank",
         "state": "CA",
         "postcode": "91505",
         "country": "US"
      },
      "line_items": [
         {
            "id": 1048
         }
      ]
   }
]'

and the (simplified) query

Insert Into @Addresses
(
    orderId,
    fullName,
    addressLine1,
    city,
    stateOrProvince,
    postalCode,
    countryCode,
    addressCode
)
SELECT
    o.orderId,
    concat(s.firstName,' ',s.lastName),
    s.addressLine1,
    s.city,
    s.stateOrProvince,
    s.postalCode,
    s.countryCode,
    convert(nvarchar(64),hashbytes('SHA1',concat(s.firstName, ' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2)
FROM OPENJSON(@json) 
    WITH  (
            orderId                     nvarchar(64)    '$.id',
            shipping                    nvarchar(max)   '$.shipping' AS JSON
          ) o  
    CROSS APPLY OPENJSON(shipping) 
    WITH  (
            firstName                   nvarchar(128)   '$.first_name',
            lastName                    nvarchar(128)   '$.last_name',
            addressLine1                nvarchar(128)   '$.address_1',
            city                        nvarchar(128)   '$.city',
            stateOrProvince             nvarchar(64)    '$.state',
            postalCode                  nvarchar(64)    '$.postcode',
            countryCode                 nvarchar(4)     '$.country'
         ) s

left join @Addresses a on a.addressCode=convert(nvarchar(64),hashbytes('SHA1',concat(s.firstName,' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2)
where a.addressCode is null

I also prepared a sqlfiddle, where you can see that it returns 3 rows while target is to get just one

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=3031a99e3cd24f6bf383c0de29cf19a1

Dale K
  • 25,246
  • 15
  • 42
  • 71
Joe
  • 1,033
  • 1
  • 16
  • 39
  • Good work on creating a fiddle. – Dale K Jun 10 '22 at 04:40
  • How did you get on? – Dale K Jun 11 '22 at 02:12
  • Hi! Sorry for late answer, I took a rapid look but couldn't make it work the `cte` part, then had to fix some urgent issues. Anyway, this fiddle does not work: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=42ec38f5c549364fbf1c8d24c110da31 while this, with Addresses as Real Table seems correct, but does not return any result. http://sqlfiddle.com/#!18/ef6d3/1 – Joe Jun 11 '22 at 16:26
  • That's because you left off the most important bit `where rn=1` – Dale K Jun 12 '22 at 01:30
  • Note on your second fiddle to make it work you have to put the query in the right hand side pane, the left hand side is for the schema only. Once you do that and add `where rn = 1` it returns the desired results. – Dale K Jun 12 '22 at 01:40
  • FYI, the semi-colon (`;`) is a statement terminator, i.e. you should be terminating the statement before the CTE, not beginning the CTE with it. – Dale K Jun 12 '22 at 01:41
  • Always heard to start a `cte` with `;WITH cte`, but as you can imagine I never use `cte` – Joe Jun 12 '22 at 05:27
  • Sorry.. concerning first `fiddle` I did not see any result but I just should scroll up :-( Anyway, now I can come to the real question: I've always supposed that using `left join` + `where field is null` was a good way to detect if a certain value was not present in a table but in my query it does not work, furthermore your example prove that my idea about `OPENJSON` is also wrong.. and this is not due to the `cte`, since I modified the `sproc` to avoid the `cte` and it still works... I'll try to learn a little more about this Thanks a lot for your time and patience! – Joe Jun 12 '22 at 05:29

1 Answers1

1

Just use the standard ROW_NUMBER() deduplication method e.g.


WITH cte AS (
    SELECT
    o.orderId,
    concat(s.firstName,' ',s.lastName) fullName,
    s.addressLine1,
    s.city,
    s.stateOrProvince,
    s.postalCode,
    s.countryCode,
    h.addressCode,
    ROW_NUMBER() OVER (PARTITION BY h.addressCode ORDER BY o.OrderId ASC) rn
    FROM OPENJSON(@json) 
    WITH  (
        orderId nvarchar(64) '$.id',
        shipping nvarchar(max) '$.shipping' AS JSON
    ) o  
    CROSS APPLY OPENJSON(shipping) 
    WITH  (
        firstName nvarchar(128) '$.first_name',
    lastName nvarchar(128) '$.last_name',
    addressLine1 nvarchar(128) '$.address_1',
    city nvarchar(128) '$.city',
    stateOrProvince nvarchar(64) '$.state',
    postalCode nvarchar(64) '$.postcode',
    countryCode nvarchar(4) '$.country'
    ) s
    CROSS APPLY (
      VALUES (CONVERT(nvarchar(64), HASHBYTES('SHA1', CONCAT(s.firstName ,' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2))
    ) h (addressCode)
    LEFT JOIN @Addresses a ON a.addressCode = h.AddressCode
    WHERE a.addressCode IS NULL
)
INSERT INTO @Addresses
(
  orderId,
  fullName,
  addressLine1,
  city,
  stateOrProvince,
  postalCode,
  countryCode,
  addressCode
)
     SELECT orderId,
         fullName,
         addressLine1,
         city,
         stateOrProvince,
         postalCode,
         countryCode,
         addressCode
    FROM cte
    WHERE rn = 1;

Note: If you use CROSS APPLY to calculate the hashcode one avoids calculating it multiple times.

Dale K
  • 25,246
  • 15
  • 42
  • 71