0

I've created the mysql user function using the levenshtein distance and ratio source codes. I am comparing 2 records and based on a 75% match I want to select the record.

  1. Order comes into table paypal_ipn_orders with an ITEM title
  2. A query executes against a table itemkey to find a 75% match in a record called ITEM as well
  3. if a 75% title is match it assigns an eight digit number from table itemkey to table paypal_ipn_orders

Here is the query

      UPDATE paypal_ipn_orders
      SET sort_num = (SELECT sort_id
      FROM itemkey
      WHERE  levenshtein_ratio(itemkey.item, paypal_ipn_orders.item_name) > 75)
      WHERE packing_slip_printed = 0 
      AND LOWER(payment_status) = 'completed' 
      AND address_name <> ''
      AND shipping < 100

I have adjusted this a few times but it's failing between line 4 and 5 at the levenshtein_ratio part. If it works it says that the subquery returns more than one row. I don't know how to fix it to make it return the correct result, I just lost as to how to make this work.

user1542036
  • 423
  • 2
  • 5
  • 9

1 Answers1

2

A subquery on a SET should only return one value. If itemkey has more than one item that is 75% of item_name what do you want to do? The below will use one of the best matches:

UPDATE paypal_ipn_orders
SET sort_num = (SELECT sort_id
  FROM itemkey
  WHERE  levenshtein_ratio(itemkey.item, paypal_ipn_orders.item_name) > 75
  ORDER BY levenshtein_ratio(itemkey.item, paypal_ipn_orders.item_name) DESC
  LIMIT 1)
WHERE packing_slip_printed = 0 
  AND LOWER(payment_status) = 'completed' 
  AND address_name <> ''
  AND shipping < 100
Andrew
  • 4,574
  • 26
  • 31
  • ok makes a lot of sense, What I find is a sub-query like this is executing very slowly, it seems to be timing out, is there a way to LIMIT the results to the first result without running through an order by. – user1542036 Aug 07 '12 at 00:19
  • Well, I imagine it could be slow because it's doing Levenshtein for every paypal_ipn_orders.item name. That might be a lot of work, but how else can you find the best? If you are willing to limit results to where the first letter or two are an exact match that could speed things up. – Andrew Aug 07 '12 at 01:02
  • So I'm thinking I can load the values I need into a temporary table first and then do levenshteins distance/ratio and then this won't time out, my order table is humungous. But I think this would work. – user1542036 Aug 21 '12 at 06:10