0

I need to match a user input against a string column. Both contain space-separated words.

The ordering criteria is:

  1. count of words matched from the beginning (prefix match) desc
  2. count of words matched desc
  3. columns where matched words come in the same order as in the input phrase come first

So given the following sample data

"one sample"
"un moment"
"no match"
"sample uno"
"uno sample"
"sample oun"

and the sample input "sa un foo", the output should be:

  1. "sample uno" - 2 prefix matches
  2. "uno sample" - the same but the order of words differs
  3. "sample oun" - 1 prefix match + 2 words matched
  4. "un moment", "one sample" - 1 prefix match

The question is: can I implement this in a single SQL query (postgresql)? I'm fairly unexperienced with SQL, so I appreciate any help. Thanks!

I include a simple SQL FIDDLE

Bulat
  • 6,869
  • 1
  • 29
  • 52
meandre
  • 2,141
  • 2
  • 16
  • 21

1 Answers1

1
  • first asign and id to each row
  • split each data by space
  • split input by space
  • cross join both tables and check if input appear on data.
  • last join back to original table and count how many prefix and contain have.

There is a case I think may require special atention if one input words appear as prefix and second word appear as contain of same word

data: FORMAT
input: FO AT

SQL FIDDLE DEMO

WITH data as  (
    SELECT 
         row_number() OVER (ORDER BY field) AS id,
         field
    FROM 
      dtable
),
data_split as (
    SELECT
        id,
        unnest(string_to_array(field, ' ')) AS elem
    FROM data
),
input_split as (
    SELECT
        unnest(string_to_array(field, ' ')) AS elem
    FROM input
),
match as (
    SELECT *, strpos(d.elem, i.elem) as match_pos
    FROM   input_split i, data_split d
)
select 
    match.id, 
    data.field,
    SUM( CASE WHEN match_pos = 1 THEN 1 ELSE 0 END ) prefix,
    SUM( CASE WHEN match_pos > 1 THEN 1 ELSE 0 END ) contain
from  
    match inner join
    data on match.id = data.id
group by match.id, data.field
order by 3 desc, 4 desc

enter image description here

Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
  • perhaps it differs from the example? – meandre Jul 30 '15 at 22:35
  • I just reread the problem. You also need a sorted output. But the prefix match are there. I guess is a lot more you have before.. – Juan Carlos Oropeza Jul 30 '15 at 22:43
  • Yes, I've just got the source of misunderstanding too - maybe I wasn't clear when I explained the problem (sorry). Actually the only thing I need is sorting. Could you please extend you solution so that the result is sorted? – meandre Jul 30 '15 at 22:48
  • I update the code and image with new sort. The only part missing is same match as input order should go first. In this case is first but that was just luck. – Juan Carlos Oropeza Jul 30 '15 at 22:49
  • Think about it a litle bit, maybe this is enought to give you the final result. You will have to update the query to add a column_id as I did for word list to calculate the exact order.. – Juan Carlos Oropeza Jul 30 '15 at 22:53