-1

I have a table named T_PERSON with rows such as FIRSTNAME varchar, LASTNAME varchar, CUSTOMERTYPE int, SELLERID int (and more)

Additionally I have a large "SQL IN statement" list of persons like this

:( 'JOHNxSMITHx12345x1337', 'SARAxBANNERx7612x1337' ... ) 

That is : FIRSTNAME x LASTNAME x CUSTOMERTYPE x SELLERID

If I run this query:

SELECT * FROM T_PERSON WHERE 
STR_REPLACE(FIRSTNAME + 'x' + LASTNAME + 'x' + STR(CUSTOMERID) + 'x' + STR(SELLERID)) in 
( 'JOHNxSMITHx12345x1337', 'SARAxBANNERx7612x1337' )

This query works fine on a tiny database with a few hundred rows, I wonder if I would be able to run this on a huge database with hundreds of millions of rows in T_PERSON.

Will this query be very performance heavy? E.g. if N is the size of the T_PERSON table, will database software in general make N^4 strings and compare each combination to all values in the list?

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
Lars Hartviksen
  • 1,163
  • 11
  • 19
  • The query itself is inefficient. You have a customerID. Use that. Or create a personId and use that. – Keith John Hutchison Aug 11 '16 at 07:17
  • I suggest reading up on third normal form. https://www.google.ie/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=third+normal+form – Keith John Hutchison Aug 11 '16 at 07:18
  • It would be better to try to work with the data type that your database system already provides for working with multiple tuples of data - tables. Most systems will have some means for you to populate some temporary table-like data structure with all of the data you would want to have in your `IN` and then you can use `JOIN`s and other set-based approaches to writing your query - which is then far more likely to be able to *use* indexes, which are the primary means of achieving better performance on poorly performing queries. (OTOH, make sure you have actual performance goals and *measure*) – Damien_The_Unbeliever Aug 11 '16 at 07:24
  • Product specific question. Which dbms are you using? – jarlh Aug 11 '16 at 09:10
  • Code is from a Adaptive Server Enterprise. – Lars Hartviksen Aug 11 '16 at 10:48
  • I dont have a customerid, any customer is identified uniquely only by the combination of the four pieces of information. Should be customertype in the code example. I will try reading about the third normal form, thanks for tip! – Lars Hartviksen Aug 11 '16 at 10:54
  • Try creating a combined index with the columns in the same order as your query Lars. – Keith John Hutchison Aug 12 '16 at 08:19

2 Answers2

1

We ended up going for a very different solution, running one Query per row to be updated. Now I feel stupid and the guy at DB who did the QA of my Query who asked the question should be ashamed.

The column names FIRSTNAME LASTNAME etc. are of course all implicitly a part of the one and same TABLE ROW. So they can be written as THESAMEROW.FIRSTNAME, THESAMEROW.LASTNAME etc. Looking at it this way, only one string will of course be constructed per row in the database.

My worry that N^4 strings would be created is therefore completely wrong, and my Query would scale perfectly linear on a larger database.

Lars Hartviksen
  • 1,163
  • 11
  • 19
0

you should try to look at the query plan using "set showplan on" or another tool (AquaStudio, RapidSQL...) to see if it's using indexes and proper indexes. Probably not because of the function str_replace in your WHERE clause.

Vince
  • 734
  • 1
  • 5
  • 10