0

In my SQL Server 2016+, I have a table with a json column, the json column has this format:

{
    "$type": "Sample.Product, Sample",
    "Name": "sample",
    "Id": "12345",
    "Policies": {
        "$type": "System.Collections.Generic.List`1[[Sample.Policy, Sample]], mscorlib",
        "$values": [
            {
                "$type": "Sample.ListPricingPolicy, Sample",
                "Prices": {
                    "$type": "System.Collections.Generic.List`1[[Sample.Money, Sample]], mscorlib",
                    "$values": [
                        {
                            "$type": "Sample.Money, Sample",
                            "CurrencyCode": "USD",
                            "Amount": 49.9900
                        }
                    ]
                },
                "PolicyId": "af5617ad5a2146a981a5db46ecae60be",
                "Models": {
                    "$type": "System.Collections.Generic.List`1[[Sample.Model, Sample]], mscorlib",
                    "$values": []
                }
            },
            {
                "$type": "Sample.PriceCardPolicy, Sample",
                "PriceCardName": "PriceCard-12345",
                "PolicyId": "c34c61051d59459fb14e057bead2d128",
                "Models": {
                    "$type": "System.Collections.Generic.List`1[[Sample.Model, Sample]], mscorlib",
                    "$values": []
                }
            }
        ]
    }
}

In above data:

  1. the main type may be not Sample.Product, Sample,
  2. Policies.$values is a generic list, can be any type of Policy
  3. PriceCardPolicy may not exist in some json data

I need to filter all those products that have PriceCardPolicy.

My current SQL is:

SELECT s.* 
    FROM   dbo.product s 
    WHERE
    json_value(s.Data, '$."$type"') = 'Sample.Product, Sample'
    AND NOT EXISTS  -- filter all those who have `PriceCardPolicy`
            (
                SELECT      e.*
                FROM        dbo.product e
                CROSS apply Openjson(e.Data, '$.Policies."$values"') WITH(policytype nvarchar(max) '$."$type"') policies
                WHERE       
                json_value(e.Data, '$."$type"') = 'Sample.Product, Sample'
                AND         policies.policytype = 'Sample.PriceCardPolicy, Sample'
                AND         s.id = e.id)

But its performance is really bad when data grows up, like 500K rows. how can I improve this?

For example:

There are 500k rows, but only 5 rows have no PriceCardPolicy. I need to find out those 5 rows.

I am not able to do any change from database, I can only focus on improving this SQL. Thx.

Dongdong
  • 2,208
  • 19
  • 28
  • 1
    That `distinct s.*` is going to be expensive in any case (regardless of the performance of JSON parsing), since you're forcing SQL Server to sort all data in all rows, including the potentially huge JSON data. Avoid `*` wherever possible, but especially when combined with `DISTINCT`. – Jeroen Mostert Dec 18 '19 at 15:26
  • @JeroenMostert I know, but I have to use it in current SQL. I believe I should also change the `NOT Exists` to something else. – Dongdong Dec 18 '19 at 15:45
  • I'm not clear why you'd "have" to use it. Aren't the rows in `dbo.Product` already unique in and of themselves, representing individual products that do not need to be de-duplicated? If not, that would be a problem to fix. (And if that falls under "I can't change the table data", the obvious response would be "then consider processing the data first into a new table so that's not an issue"; there really are things you don't want to have to fix in one single query every time it's run.) If they're not unique due to some careless `JOIN` elsewhere (not shown here), fix that. – Jeroen Mostert Dec 18 '19 at 15:55
  • @JeroenMostert OKey, let me remove this from SQL. Thx – Dongdong Dec 18 '19 at 15:59
  • I don't see how you are going to do much to improve this. Since you can't persist the data into tables you are going to be forced to parse the entire table every single time you run this query. Your JSON is basically storing a whole list of relational data in a single element which means to query it you have to parse the whole thing for every single row in your table. – Sean Lange Dec 18 '19 at 16:01
  • As an "out of the box" approach, if you're not actually interested in shredding the JSON data for actual values in the search, a simple `WHERE [data] NOT LIKE '%PriceCardPolicy%'` might do. This can then still be processed with an actual parse to eliminate any false positives. If that's already too slow, you have no real choice but to make inserting the data smarter. – Jeroen Mostert Dec 18 '19 at 16:06
  • @SeanLange yes, you are correct. we know the design issue. but at this moment I cannot focus on this structure. – Dongdong Dec 18 '19 at 16:06
  • @JeroenMostert No, the word `PriceCardPolicy` might exists somewhere else. `OpenJson` will split 1 data row to the policy numbers of that array. But I don't want to `cross apply` to all those value but only `cross apply` on the one who is `PriceCardPolicy`. that's right the problem and it's why I have to use `Not exists`. – Dongdong Dec 18 '19 at 16:11
  • But like I said, unless you can change the design, or at least materialize the shredded values there isn't much you can do here. Any improvements to this will be minimal at best. – Sean Lange Dec 18 '19 at 16:14
  • 1
    Sure, the word might exist somewhere, but if it exists in significantly less than the 500K rows (not necessarily 5, but less) this decreases the amount of rows that actual, full JSON parsing has to be applied to when wrapped as a subquery, so it may still be worth it. Try a simple `SELECT COUNT(*) FROM Products WHERE [data] NOT LIKE '%PriceCardPolicy%'` to check. Of course if the *JSON* consists of 500K objects, you're not going to be optimizing this, period. – Jeroen Mostert Dec 18 '19 at 16:16
  • @JeroenMostert yes, smart! make sense! it's a good point! Thanks a lot! any other ideas? – Dongdong Dec 18 '19 at 16:19

1 Answers1

0

If you need to search inside the json, you need to create computed columns and index them, or shred the values in other tables and index properly.

e.g.:

ALTER TABLE dbo.product ADD json_datatype as json_value(s.Data, '$."$type"');
CREATE INDEX idx_product_json_type ON dbo.product (json_datatype);
PeterHe
  • 2,766
  • 1
  • 8
  • 7