0

I am trying to perform data masking on JSON data. Using a Javascript UDF to update list of NESTED JPATH attributes similar to what is done here, https://www.snowflake.com/blog/masking-semi-structured-data-with-snowflake/

Additionally I tried nested OBJECT_INSERT statements to mask a specific attribute but having multiple attributes to mask I have to build a list of subqueries to perform OBJECT INSERT on previous sub query result which is complex. Ex:

FROM (
SELECT OBJECT_INSERT(VAR_COL,'LVL1',OBJECT_INSERT(VAR_COL:LVL1,'KEY1',OBJECT_INSERT(VAR_COL:LVL1.KEY1,'KEY2','VALUE',TRUE),TRUE),TRUE) AS VAR_COL
FROM TABLE
)

Another problem with OBJECT_INSERT which is not letting me use it is if the JPATH doesn't exists for a specific JSON row it will add that JPATH which I dont want.

I am working with million of records and using XS Warehouse it takes 15 mins to do a simple query using JavaScript UDF.

Alternately, also tried Snowpark UDF but it is also showing very small improvement.

Any idea on improving performance further?

  • Just because Snowflake can query JSON data doesn’t mean it is a good idea to hold data that users will access in JSON format. Convert it to a relational format and your life will become much easier – NickW Dec 26 '22 at 11:53
  • As @NickW states, it's not always best to leave JSON in its raw form for general consumption. You may want to try using a view over the raw data and then applying the DDM to the view. I would think this would be a lot more efficient and flexible than the UDF. – Mike Walton Dec 26 '22 at 18:11

0 Answers0