2
---------------
MASTER TABLE
---------------
DATA_KEY NUMBER
TEXT VARCHAR2(2000)
ORDER_NO NUMBER

---------------
DETAIL TABLE
---------------
DATA_KEY NUMBER
SIMILAR_DATA_KEY NUMBER
DISTANCE_COUNT NUMBER

---------------
INSERT QUERY
---------------
INSERT INTO DETAIL
(
  SELECT DATA_KEY, SIMILAR_DATA_KEY, DISTANCE_COUNT
  FROM
  (
  SELECT A.DATA_KEY AS DATA_KEY, B.DATA_KEY AS SIMILAR_DATA_KEY, 
    UTL_MATCH.EDIT_DISTANCE(A.TEXT, B.TEXT) AS DISTANCE_COUNT
  FROM 
    (SELECT DATA_KEY, TEXT, ORDER_NO FROM MASTER) A
    INNER JOIN
    (SELECT DATA_KEY, TEXT, ORDER_NO FROM MASTER) B
    ON (A.ORDER_NO < B.ORDER_NO)
  )
  WHERE DISTANCE_COUNT <= 5
)

I need compare MASTER table TEXT field with other TEXT field.

indexes are not exist. master table 90,000 rows.

ORDER_NO field is for avoid duplicated compare. (1 .. 90000)
=============================================================
A.ORDER_NO < B.ORDER_NO
------------------------------------------
1, 1 <- exclude
1, 2 <- join
1, 3 <- join
1, 4 <- join
..
2, 1 <- exclude
2, 2 <- exclude
2, 3 <- join
2, 4 <- join
...
3, 1 <- exclude
3, 2 <- exclude
3, 3 <- exclude
3, 4 <- join

1. NOT need compare 1 and 1
2. Need compare 1 and 2
3. NOT need compare 2 and 1 (because, duplicate 2.)

so, for decrease compare count...
=============================================================

Slow zone is (WHERE DISTANCE_COUNT <= 5) ?

Slow zone is comparing rows (90000*89999/2) ?

Query elapse time is 7 days.

6,000 rows inserted to DETAIL table.

How to speed up?

I'm sorry for poor English...

Unheilig
  • 16,196
  • 193
  • 68
  • 98
  • What are you trying to accomplish with this JOIN "A.ORDER_NO < B.ORDER_NO" – Robert Dupuy Sep 22 '15 at 01:38
  • Thanks! I understand now. I'm going to try and change the join. I will post an answer later if it is still unanswered. – Robert Dupuy Sep 22 '15 at 05:04
  • I took a look at it, but unfortunately some few hundred rows of test data is not going to be sufficient to test this here. Here is how I would troubleshoot, I would aim to figure out just how many calls to UTL_MATCH.EDIT_DISTANCE can be run in an hour. I would write it in PL/SQL. I would load up everything in memory once, and then just loop through the data, not going to the database again - and get that run rate. – Robert Dupuy Sep 22 '15 at 06:29

0 Answers0