3

Given a textbox name for example, the user requirement wants to be able to do a wildcard search (such as contains, starts with, ends with).

Is it ok to accept the sql wildcard characters ('%' and '_') as input as long as I am still using parameterized query in the backend (Java)? Effectively, allowing the user to build his own regular expression which is what the user's requirement is all about.

Example:

  1. User types in the

    textbox = '%are%'
    
  2. This parameter is feed to the backend as such:

    public class PersonDaoImpl {
    
            public List<Person> search(String name){//name gets the value from textbox w/ sql wildcards
            Query q = mgr.createNativeQuery('select * from Person where name like :name'); //default to always use like since expecting searchkey with sql wildcards    
            q.setParameter('name', name);//gives the input from the screen
            return q.getResultList();
            } 
    }  
    
  3. The result set would include people with names 'Waren', 'Jared', 'Clare', 'Blare' as expected since user provided a regular expression.

With the SQL Parameterize Query, I can ensure that I won't be allowing SQL Injection. This implements the user requirement for wildcard search, but perhaps does it violate anything that I may have missed?

UPDATES: Just found out that Google allows wildcard too, from their help page.

Carlos Jaime C. De Leon
  • 2,476
  • 2
  • 37
  • 53
  • I'm not sure how :name will be interpreted in the sql..YOu may have to write like below 'select * from Person where name like ' + '\'' + name + '\'' ... – Bala Oct 14 '11 at 06:21
  • I would suggest user should pass %are% instead of '%are%' given that you need quotes always, you can write code that part... – Bala Oct 14 '11 at 06:24
  • Edited my question and added the missing line of code, q.setParameter('name', name) which will replace the :name in the query with the value of the variable name. – Carlos Jaime C. De Leon Oct 14 '11 at 06:28

3 Answers3

3

Well, it violates the fact that the user needs to know (or be told) how to construct SQL "LIKE" syntax, but that's all. You could end up with a slow query this way, in that it won't usually be able to use an index, but I wouldn't be concerned in terms of security or correctness.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Thanks for the comment with regards to security or correctness. – Carlos Jaime C. De Leon Oct 14 '11 at 06:33
  • Additionally, I was thinking along the lines on how Jira Issue Navigator allows advanced searching by allowing the user to provide expressions http://confluence.atlassian.com/display/JIRA/Advanced+Searching. But of course, this wildcard search that I have is nothing close to flexibility/complexity as the JIRA advanced search. – Carlos Jaime C. De Leon Oct 14 '11 at 07:10
2

It's "safe", but probably not a good idea, for two reasons:

  1. It's probably not the best ui design to require your users to know sql syntax for this.
  2. It's horrible for performance: these queries often can't use your indexes, so they are slow to execute. And they require a lot of cpu time to compare all that text, so they add a lot of load (disproportionate to the already high execution time) to your server. You want a solution that relies on a full-text index instead.
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • On no. 2, the client wants a 'contains' search which is actually '%searchkey%'. I have informed client about the performance due to this, its been considered and the functionality outweighs the performance. I do plan to add some index as described here in this answer/question http://stackoverflow.com/questions/7116094/i-need-to-implement-wildcard-search-using-on-both-ends-how-can-i-improve-perfo/7116118#7116118. – Carlos Jaime C. De Leon Oct 14 '11 at 06:32
  • @Carlos - you missed the point. Sql Server has a function called CONTAINS() that does exactly what your client wants, without the performance penalty. – Joel Coehoorn Oct 14 '11 at 06:48
  • Good point, I am not aware of that function. I'm using Sybase, will try to see if it has something similar. I suppose if the user wants 'starts with' and 'ends with' then it would be a function call to startsWith or endsWith if there are such existing functions? – Carlos Jaime C. De Leon Oct 14 '11 at 07:00
0

I am curious, how does the name parameter end up getting set in the request? What platform is this? (OP missed setParameter earlier)

As you noted the user need to know the wild-card syntax i.e. the use of %, _, etc. A more popular approach is to just get the string from the username, along with an option for 'exact match'/'starts-with'/'anywhere-in-name'. If you go that route you will also be able to execute a more efficient query in the first two cases.

EDIT:

If the customer insists on contains query then I think your current approach of requiring the end-user to input a pattern better then converting the input string to pattern by putting % around it.

This is because the users will still have the option of not adding (or selectively adding) the % to the search string, resulting in faster query execution. For example:

  • If the user enter search string Don the query is select ... from ... where name like 'Don'. The RDBMS will most likely use the index on name.

  • If the user enter search string Don% the query is select ... from ... where name like 'Don%'. The RDBMS will still quite likely use the index on name.

  • If the user enter search string %Don or %Don% then the index cannot be used.

Miserable Variable
  • 28,432
  • 15
  • 72
  • 133
  • I missed the line of code q.setParameter('name', name); I have edited the question to reflect this, thanks for seeing that. This is in Java, it uses PreparedStatements underneath for Parameterized Query, where :name can be replaced with a value by doing the line of code I just added. – Carlos Jaime C. De Leon Oct 14 '11 at 06:35
  • Ok :) Looking at your other comments here, if the customer always wants "contains" query then you might as well put the `%` around the input. On the other hand, allowing user to input it allows user to enter `term`, `term%`, `%term` or `%term%` which can result in faster query execution. – Miserable Variable Oct 14 '11 at 06:38
  • 1
    Yes, I would like to allow the user to be able to build his regular expression, he/she can even do something like this '%ap%zel' to get 'Rapunzel'. :) – Carlos Jaime C. De Leon Oct 14 '11 at 06:43