1

I'm working with a masked database on my QA server using SQL Server Standard (64-bit) 14.0.1000.169. This is my structure:

    CREATE TABLE [dbo].[Test](
         [Column1] [VARCHAR(64)] NULL,
         [Column2] [VARCHAR(64)] NULL
    )
    GO
    INSERT INTO [dbo].[Test]
    VALUES ('ABCDEFG', 'HIJKLMN')

I've masked the column with the following code:

    ALTER TABLE [dbo].[Test]
    ALTER COLUMN [Column1] VARCHAR(64) MASKED WITH (FUNCTION = 'default()');

It works as expected when I perform the following query using a non-allowed user:

    SELECT [Column1], [Column2]
    FROM [dbo].[Test]
    FOR JSON PATH
    -- RESULT: '[{"Column1":"xxxx", "Column2":"HIJKLMN"}]'

But it doesn't work when the same non-allowed user saves the result in variable (the main goal):

    DECLARE @var VARCHAR(64)
    SET @var = (SELECT [Column1], [Column2] FROM [dbo].[Test] FOR JSON PATH)
    SELECT @var --it should show a valid JSON...
    -- RESULT: 'xxxx' <-- JSON LOSES ITS STRUCTURE
    -- DESIRED RESULT: '[{"Column1":"xxxx", "Column2":"HIJKLMN"}]' <-- VALID JSON

Main problem: JSON looses its structure when a masked column appear in the SELECT and "FOR JSON PATH" clause is present.

We want to get a valid JSON even if the data column is masked or not, or even if sa user or not.

I've tested using NVARCHAR or doing a CAST in the masked column, but the only way we get the desired result is using a #tempTable before use the "FOR JSON PATH" clause.

How can I do for SELECT a masked column and save it to VARCHAR variables without loose JSON structure? Any help will be appreciated.

NOTE: The SA user is default allowed to see unmasked data (so the JSON doesn't loose its structure), but we want to execute it on a non-allowed user and return a valid JSON, not only 'xxxx'.

Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • I can't reproduce what you claim. It's quite likely you're executing the query using a highly-privileged account that *can* read unmasked data. If your application connect to the database using `sa` it will be able to read everything. – Panagiotis Kanavos Jul 13 '22 at 17:30
  • No repro [in Dbfiddle](https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=9704dc8db2aeb8dcb07e6c8ee5b94279) targeting SQL Server 2017 – Panagiotis Kanavos Jul 13 '22 at 17:37
  • Hi. Thank you @PanagiotisKanavos to answer. English is not my primary language, but I'll try to explain the best I can. Talking about the problem... as you said, thats the point. `SA` user can save the JSON SELECT into @variable without losing structure, but normal user doesn't. Normal user receives "xxxx" instead. No valid JSON returned when tries to save it in @variable. In dbFiddle you can find the problem in the line 18. That select should be the same as line 21. – Moises Hernandez Jul 13 '22 at 18:47
  • @PanagiotisKanavos I note the repro works with `FOR XML` also https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=65aa1a6fa7c6aa2c6a677b0480a2f418 it also works even without a variable, when in a nested `SELECT` https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=92ab4d5dc0a15f025a8e9aac498e2cc4. I note that there is a different code path when using a bare `FOR JSON` or `FOR XML`, as you can see from the query plan (this is normal, but may be what is making the difference). – Charlieface Jul 13 '22 at 20:10

1 Answers1

5

It does indeed appear to be a bug. Repro is here. Although see below, not so sure.

When using FOR JSON, or for that matter FOR XML, as a top level SELECT construct, a different code path is used as compared to placing it in a subquery or assigning it to a variable. This is one of the reasons for the 2033-byte limit per row in a bare FOR JSON.

What appears to be happening is that in the case of a bare FOR JSON, the data masking happens at the top of the plan, in a Compute Scalar operator just before the JSON SELECT operator. So the masking happens on just the one column.

PasteThePlan

enter image description here

Whereas when putting inside a subquery, a UDX function operator is used. The problem is that the Compute Scalar is happening after the UDX has created the JSON or XML, whereas it should have been pushed down below the UDX in the plan.

PasteThePlan

enter image description here

I suggest you file the bug with Microsoft, on the Azure Feedback site.


Having gone over this a little, I actually think now that it's not a bug. What does seem to be a bug is the case without nesting.

From the documentation:

Whenever you project an expression referencing a column for which a data masking function is defined, the expression will also be masked. Regardless of the function (default, email, random, custom string) used to mask the referenced column, the resulting expression will always be masked with the default function.

Therefore, when you select any masked column, even in a normal SELECT, if you use a function on the column then the masking always happens after any other functions. In other words, the masking is not applied when the data is read, it is applied when it is finally output to the client.

When using a subquery, the data is fed into a UDX function operator. The compiler now senses that the final resultset is a normal SELECT, just that it needs to mask any final result that came from the masked column. So the whole JSON is masked as one blob, similar to if you did UPPER(yourMaskedColumn). See the XML plan in this fiddle for an example of that.

But when using a bare FOR JSON, it appears to the compiler as a normal SELECT, just that the final output is changed to JSON (the top-level SELECT operator is different). So the masking happens before that point. This seems to me to be a bug.

The bug is even more egregious when you use FOR XML, which uses the same mechanisms. If you use a nested FOR XML ..., TYPE then you get just <masked /> irrespective of whether you nest it or not. Again this is because the query plan shows the masking happening after the UDX. Whereas if you don't use , TYPE then it depends if you nest it. See fiddle.

Charlieface
  • 52,284
  • 6
  • 19
  • 43