1

I have a project that requires we allow users to create custom columns, enter custom values, and use these custom values to execute user defined functions.

Similar Functionality In Google Data Studio

We have exhausted all implementation strategies we can think of (executing formulas on the front end, in isolated execution environments, etc.).

Short of writing our own interpreter, the only implementation we could find that meets the performance, functionality, and scalability requirements is to execute these functions directly within MySQL. So basically taking the expressions that have been entered by the user, and dynamically rolling up a query that computes results server side in MySQL.

This obviously opens a can of worms security wise.

Quick aside: I expect to get the "you shouldn't do it that way" response. Trust me, I hate that this is the best solution we can find. The resources online describing similar problems is remarkably scarce, so if there are any suggestions for where to find information on analogous problems/solutions/implementations, I would greatly appreciate it.

With that said, assuming that we don't have alternatives, my question is: How do we go about doing this safely?

We have a few current safeguards set up:

  1. Executing the user defined expressions against a tightly controlled subquery that limits the "inner context" that the dynamic portion of the query can pull from.
  2. Blacklisting certain phrases the should never be used (SELECT, INSERT, UNION, etc.). This introduces issues, because a user should be able to enter something like: CASE WHEN {{var}} = "union pacific railroad" THEN... but that is a tradeoff we are willing to make.
  3. Limiting the access of the MySQL connection making the query to only have access to the tables/functionality needed for the feature.

This gets us pretty far. But I'm still not comfortable with it. One additional option that I couldn't find any info online about was using the query execution plan as a means of detecting if the query is going outside of its bounds.

So prior to actually executing the query/getting the results, you would wrap it within an EXPLAIN statement to see what the dynamic query was doing. From the results of the EXPLAIN query, you should able to detect any operations (subqueries, key references, UNIONs, etc.) that fall outside of the bounds of what the query is allowed to do.

Is this a useful validation method? It seems to me that this would be a powerful tool for protecting against a suite of SQL injections, but I couldn't seem to find any information online.

Thanks in advance!

(from Comment)

Some Examples showing the actual autogenerated queries being used. There are both visual and list examples showing the query execution plan for both malicious and valid custom functions.

Rick James
  • 135,179
  • 13
  • 127
  • 222
Lee Roy
  • 21
  • 4
  • QUOTEevery value – nbk Jul 16 '21 at 20:00
  • Could you elaborate @nbk ? If I quote the expressions they would be treated as strings and not evaluated. Thanks! – Lee Roy Jul 16 '21 at 21:18
  • sql injection is always how, can you execute another sql command behind the original command. in a stored procedure use the mysql command QUOTE to quote the value you send, mysql doesn't mind as it converts anything automatically Further use prepared statements, they exist almost everywhere. then comes the biggest problem inserting table names column names and so on, which you can use prepared statements, the solution the is to check information_schema with QUOTE command if the table or columns exists. if you are unsure about sql injection use for all stored procedures with the measures – nbk Jul 16 '21 at 21:38
  • Hi @nbk, quoting values alone does get us quite where we need to be. Check out my discussion with Rick James below for more info. I appreciate the response! – Lee Roy Jul 20 '21 at 19:34
  • try and you will see it works – nbk Jul 20 '21 at 20:20
  • Hi @nbk - I put together a few examples showing the issues with this. Take a look at them here [quote examples] (https://imgur.com/a/ZqLvyEM). This is three quick examples in PHP showing how quote can be used to achieve the desired functionality, but also is left open to potential attacks. Let me know if that makes sense! – Lee Roy Jul 21 '21 at 16:49

2 Answers2

2
  • GRANT only SELECT on the table(s) that they are allowed to manipulate. This allows arbitrarily complex SELECT queries to be run. (The one flaw: Such queries may run for a long time and/or take a lot of resources. MariaDB has more facilities for preventing run-away selects.)

  • Provide limited "write" access via Stored Routines with expanded privileges, but do not pass arbitrary values into them. See SQL SECURITY: DEFINER has the privileges of the person creating the routine. (As opposed to INVOKER is limited to SELECT on the tables mentioned above.)

Another technique that may or may not be useful is creating VIEWs with select privileges. This, for example, can let the user see most information about employees while hiding the salaries.

Related to that is the ability to GRANT different permissions on different columns, even in the same table.

(I have implemented a similar web app, and released it to everyone in the company. And I could 'sleep at night'.)

I don't see subqueries and Unions as issues. I don't see the utility of EXPLAIN other than to provide more info in case the user is a programmer trying out queries.

EXPLAIN can help in discovering long-running queries, but it is imperfect. Ditto for LIMIT.

More

I think "UDF" is either "normalization" or "EAV"; it is hard to tell which. Please provide SHOW CREATE TABLE.

This is inefficient because it builds a temp table before removing the 'NULL' items:

FROM ( SELECT ...
        FROM  ...
        LEFT JOIN ...
     ) AS context
WHERE ... IS NULL

This is better because it can do the filtering sooner:

FROM ( SELECT ...
        FROM  ...
        LEFT JOIN ...
        WHERE ... IS NULL
     ) AS context
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thank you so much for the answer @rick-james ! So we are using most of those techniques. Subqueries are an issue because it (potentially) allows you to select data outside the intended scope. For example if our generic query is like: `SELECT users.id, users.email, {{UDF}} as udf_0 FROM users WHERE users.id = 101` If {{UDF}} was something like “(SELECT user.email FROM users WHERE users.id = 102)” this would return another users data as udf_0. – Lee Roy Jul 19 '21 at 16:37
  • Granting column permissions won’t work here either, because we need access to that column. I believe the same would apply for limiting privileges or creating views (unless a separate view was created per user, which isn’t feasible in my use case). Thank you again! And please correct me if I am misunderstanding. – Lee Roy Jul 19 '21 at 16:38
  • One other thing to note here. This is why I think EXPLAIN would be useful. In the explain statement, you would be able to detect if a subquery is being made in a generic fashion. Regardless of what method was being used to get data outside of the allowed scope, that will be registered in the query execution plan. The idea would be instead of trying to account for/scrub the query for every possible attack vector, you would simply look at the execution plan for unexpected behavior @rick-james – Lee Roy Jul 19 '21 at 16:58
  • @LeeRoy - By "UDF" do you mean "User Defined Function" (and not "Stored Function")? If so, I assume that it must take care of itself. As for a subquery hitting a different table, that is where `GRANTs` come into play. `VIEWs` are syntactic sugar; the permissions are determined as if you simply ran the underlying `SELECT`. – Rick James Jul 19 '21 at 17:41
  • @LeeRoy - I would be interested to see a specific example of how you can make use of it. (A simple fabricated example would do.) Be aware that, in some situations, `EXPLAIN` evaluates a subquery. – Rick James Jul 19 '21 at 17:43
  • Hi @rick-james - By UDF I mean the string that a user is entering as their "custom function." As for examples - here [link] (https://imgur.com/a/Jlabuj3) is an example showing the actual autogenerated query being used. There is both visual an list examples showing the query execution plan for both malicious and valid custom functions. The function column we are interested in this case is _value_20. One of the queries is a simple computation. The other is a select (what we don't want to happen). – Lee Roy Jul 20 '21 at 16:56
  • @LeeRoy - I added some More. – Rick James Jul 20 '21 at 18:01
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/235108/discussion-between-lee-roy-and-rick-james). – Lee Roy Jul 20 '21 at 18:50
  • Not allowed to chat anymore since my reputation dipped below 20. Regarding the use of CURRENT_USER--if I am understanding you correctly--this would require I create/connect with a different "MySQL User" for each user that was issuing the query correct? – Lee Roy Jul 21 '21 at 17:21
  • @LeeRoy - Yes, if you need different users to see / not see different things. `GRANTs` work off "users". (In MySQL 8.0 there are "roles", which allows groups of users to have similar permissions.) – Rick James Jul 21 '21 at 19:02
  • Work with what you have learned from this discussion, then start a new Question if you need more advice. (This Q&A is rambling on and becoming not very useful for others.) Meanwhile, taking the stackoverflow tutorial is an easy way to get +100 rep points. – Rick James Jul 21 '21 at 19:05
  • Fair point. My take away is that there's no "right" way to do this. It depends on the use case and risk tolerance. Marking your answer as the correct one for now. Thanks again for all your time @rick-james – Lee Roy Jul 21 '21 at 22:42
0

I wanted to share a solution I found for anyone who comes across this in the future.

To prevent someone from entering some malicious SQL injection in a "custom expression" we decided to preprocess and analyze the SQL prior to sending it to the MySQL database.

Our server is running NodeJS, so we used a parsing library to construct an abstract syntax tree from their custom SQL. From here we can traverse the tree and identify any operations that shouldn't be taking place.

The mock code (it won't run in this example) would look something like:

const valid_types  = [ "case", "when", "else", "column_ref", "binary_expr", "single_quote_string", "number"];
const valid_tables = [ "context" ];

// Create a mock sql expressions and parse the AST
var exp   = YOUR_CUSTOM_EXPRESSION;
var ast = parser.astify(exp);

// Check for attempted multi-statement injections
if(Array.isArray(ast) && ast.length > 1){
  this.error = throw Error("Multiple statements detected");
}

// Recursively check the AST for unallowed operations
this.recursive_ast_check([], "columns", ast.columns);


function recursive_ast_check(path, p_key, ast_node){

  // If parent key is the "type" of operation, check it against allowed values
  if(p_key === "type") {
    if(validator.valid_types.indexOf(ast_node) == -1){ 
      throw Error("Invalid type '" + ast_node + "' found at following path: " +  JSON.stringify(path));
    }
    return;
  }

  // If parent type is table, then the value should always be "context"
  if(p_key === "table") {
    if(validator.valid_tables.indexOf(ast_node) == -1){
      throw Error("Invalid table reference '" + ast_node + "' found at following path: " +  JSON.stringify(path));
    }
    return;
  }

  // Ignore null or empty nodes
  if(!ast_node || ast_node==null) { return; }

  // Recursively search array values down the chain
  if(Array.isArray(ast_node)){
    for(var i = 0; i<ast_node.length; i++) {
      this.recursive_ast_check([...path, p_key], i, ast_node[i]); 
    }
    return;
  }

  // Recursively search object keys down the chain
  if(typeof ast_node === 'object'){
    for(let key of Object.keys(ast_node)){
      this.recursive_ast_check([...path, p_key], key, ast_node[key]);
    }
  }

}

This is just a mockup adapted from our implementation, but hopefully it will provide some guidance. Should also note, it is best to also implement all of the strategies discussed above as well. Many safeguards are better than just one.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Lee Roy
  • 21
  • 4