0

When importing/parsing flat files and delimited data (we use SqlBulkCopy for these) we end up with a table with a DATA_KEY and DATA_VAL column. [EDIT] Forgot to mention this table is a work table for the .net crl to parse the various inbound data only. To load the data into the production tables from this work table we then use a dynamic query to PIVOT the data like so:

DECLARE @sqlCommand varchar(MAX)
DECLARE @columnList varchar(max)

SELECT 
    @columnList = STUFF((SELECT DISTINCT',MIN(CASE DATA_KEY WHEN '''+DATA_KEY+''' THEN DATA_VAL END) AS '+ QUOTENAME(DATA_KEY) 
                         FROM PARSED_DATA
                         FOR XML PATH ('')), 1, 1, '')

SET @sqlCommand = 'SELECT ROW_NUMBER() OVER(ORDER BY KEYID,PARENT_POS) AS ROWID,  ' + @columnList + ' FROM PARSED_DATA 

This works fine, but brings about the question of possible SQL injection. Obviously this is a concatenated string, and so DATA_KEY and DATA_VAL could potentially be subject to injection. When I look at the DATA_KEY field, it seems that the only way injection could work is if the attacker knew this was inside a CASE statement- that is, the injected code would have to END the CASE statement properly or the entire query would fail. Similarly, the DATA_VAL field would fail the entire query if the injection did not properly END the CASE statement.

Am I correct here? Or am I missing something obvious? Also, where DATA_KEY is used as the column name are we susceptible there? QUOTENAME make the value a valid SQL identifier; is it possible to have injection there?

Just looking to make sure all bases are covered; comments and suggestions welcome!

A. Guattery
  • 113
  • 8
  • `a table with a DATA_KEY and DATA_VAL column.` don't use such tables. The offer no benefit at all (certainly no "flexibility") while making querying almost impossible. Indexing is useless when the values have are completely unrelated, value validation is impossible, buffering becomes far less efficient since a *lot* of the data loaded from the table would be for unrelated "entities". Foreign keys would be impossible. That's the bug, not aggregating column names. – Panagiotis Kanavos Dec 17 '21 at 21:57
  • If you want "flexible" data store it in an XML column, sparse columns or as JSON. Those are *far* more efficient and easier to query than a key/value table. You can have up to 30K sparse rows per table. You can specify persisted computed columns that extract specified fields from a JSON document and even index them – Panagiotis Kanavos Dec 17 '21 at 22:00
  • @Panagiotis Kanavos I’m sorry I should have been mote clear. This DATA_KEY/DATA_VAL table is only a work table. We have a .NET CRL which parses many different inbound files per day into this table, and then a stored procedure with this dynamic t-sql to load the data into respective production tables. I will update my question to be more clear. – A. Guattery Dec 18 '21 at 02:02
  • 1
    Wrap QUOTENAME() around the column names, which will protect you from any possible injection at least in that specific spot. See https://www.mssqltips.com/sqlservertip/3637/protecting-yourself-from-sql-injection-in-sql-server-part-1/ and https://www.mssqltips.com/sqlservertip/3638/protecting-yourself-from-sql-injection-in-sql-server-part-2/ Also if you’re on 2017+ please look into STRING_AGG which is a whole lot easier to look at than FOR XML PATH… – Aaron Bertrand Dec 18 '21 at 06:40
  • @AaronBertrand thank you for that. The column names are wrapped so that part is covered. Unfortunately this is still 2012, and will only be migrating to 2014 sometime this coming year. The 3rd party software company has not certified anything newer yet, and is never in much of a hurry to do so... – A. Guattery Dec 18 '21 at 12:21
  • 1
    You can also check if DATA_KEY is in sys.columns for before blindly putting it into the CASE expression. That really reduces the risk to almost zero. – Aaron Bertrand Dec 18 '21 at 14:01
  • @AaronBertrand yes that is exactly what we do. We have a separate table that lists the inbound production table for a given batch. we then grab this table name and compare the distinct DATA_KEY values to the INFORMATION_SCHEMA.COLUMNS table to validate they exist. We also check the DATA_VAL maximum length inbound for a given column and compare that to the schema as well. – A. Guattery Dec 18 '21 at 15:13
  • 1
    You need `QUOTENAME` on the `CASE` part also `...CASE DATA_KEY WHEN ' + QUOTENAME(DATA_KEY, '''') + ' THEN...` You also need to change your `FOR XML` to use `.value` in order to unescape XML characters. If you have SQL Server 2017+ just use `STRING_AGG` – Charlieface Dec 18 '21 at 18:23
  • @Charlieface if we use `QUOTENAME` there it would result in something like `CASE DATA_KEY WHEN '[FOO]' THEN` ... That would break the code as the `SELECT` query would be looking for the string value `'[FOO]'` instead of `'FOO'` – A. Guattery Dec 18 '21 at 19:25
  • 1
    No use the extra parameter `QUOTENAME(..., '''')` that will result in `'FOO'` see https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=ef50d86b4b596c7f13be4907c403da36 – Charlieface Dec 18 '21 at 19:31
  • @Charlieface DOH! I answered you without thinking it through. You are correct, and I will make that adjustment to the code. Just as an FYI this on SQL 2012 (will be for some time) . – A. Guattery Dec 18 '21 at 19:35

1 Answers1

2

There are a few issues I can see:

  • You should use QUOTENAME on the pivot CASE expression also. Use '''' as the second parameter for this, to get '' as the delimiter
  • FOR XML should use .value to unescape XML characters
  • The third parameter of STUFF is the length. To ensure it is always correct, use a @separator variable
  • Dynamic SQL should be in nvarchar(max)
  • The performance of DISTINCT over a calculate field may be slow, it may be wiser to group first in a derived table/subquery
DECLARE @separator nvarchar(10) = ',';

DECLARE @columnList nvarchar(max) = STUFF(
    (SELECT @separator + 'MIN(CASE DATA_KEY WHEN ' + QUOTENAME(DATA_KEY, '''') + ' THEN DATA_VAL END) AS ' + QUOTENAME(DATA_KEY)
     FROM (SELECT DISTINCT DATA_KEY FROM PARSED_DATA) pd
     FOR XML PATH (''), TYPE
).value('text()[1]','nvarchar(max)'), 1, LEN(@separator), '');

You say you are using this to generate an INSERT statement, so you should also check the column names against sys.columns

Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • very useful information. The `nvarchar` is actually in use, when I typed this example I inadvertently stuck `varchar` in there. Very good on the `STUFF....value` catch. I don't think this has ever been used for XML (yet) - regardless this is good housekeeping and will be changed. Performance tests to come on all queries once housekeeping is done. Also, columns are checked against the schema prior to getting to this portion of the code. – A. Guattery Dec 18 '21 at 19:52
  • It's not about XML, it's about characters which XML would escape, such as `"` `<` or `&` – Charlieface Dec 18 '21 at 19:58
  • Yeah i wasn't very clear there was I... I should have said something more along the lines of "I don't think this has ever been used where XML chars came into play" But yes thank you for that. – A. Guattery Dec 18 '21 at 20:05