1

I have written the Longest Common Subsequence algorithm in PL/SQL function. I need only the length of the subsequence as output. Now my greatest challenge is the speed of the function when using the function for hundreds of thousands of records. There is a nested loop which fills the multi-dimensional table type object.

I have two tables one containing 1 million records (string) and other having almost 28,000 records(string). The requirement is to map each records in table 1 with each records of table 2 and get the LCS length for each.

According to my analysis the nested for loop is taking maximum amount of time.

/*
LCS LOGIC
*/

1 CREATE OR REPLACE
2  FUNCTION GET_LCS_LENGTH(    
3       table1_string  IN NVARCHAR2,
4       table2_string IN NVARCHAR2)
5     RETURN NUMBER
6   AS
7   TYPE t_number_array IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
8   TYPE t_bidimensional_number_array IS TABLE OF t_number_array INDEX BY BINARY_INTEGER;
9   matrix t_bidimensional_number_array ;
10    --...
11  BEGIN
12    len_str1     := LENGTH(table1_string);
13    len_str2     := LENGTH(table2_string);
14    matrix(1)(1) := 0;
15    FOR i IN 2..(len_str2+1)
16    LOOP
17      matrix(i)(1) := 0;
18      ch1          := SUBSTR(table2_string,i-1,1);
19      FOR j IN 2..(len_str1+1)
20      LOOP
21        matrix(1)(j)   := 0;
22        ch2            := SUBSTR(table1_string,j-1,1);
23        IF ch1          = ch2 THEN
24          matrix(i)(j) := matrix(i - 1)(j - 1) + 1;
25        ELSE
26          matrix(i)(j) := greatest(matrix(i)(j - 1),matrix(i - 1)(j));
27        END IF;
28      END LOOP;
29    END LOOP;
30    lcs_Dist := matrix(len_str2+1)(len_str1+1);
31    matrix.DELETE;
32  END;
  /*

  LCS LOGIC END

  */

How can I replace this nested for loop to get the LCS length or use some different logic or how to optimize this code further?

Currently when one record in table 1 maps with each 28,000 records in table 2 , the time taken is 5 seconds which is costly for me.

It should run in fractions of seconds for 28,000 records then only I can achieve a certain goal with the other 1 million records.

James Z
  • 12,209
  • 10
  • 24
  • 44

1 Answers1

1

A couple of improvements:

  • You do not need to store all rows of the matrix, you only need the current row and the previous row.
  • You do not need to use an associative array (TABLE OF x INDEX BY y) and can just use a collection (TABLE OF x).
  • You can keep track of the previous value in the row so you do not need to look it up in the matrix each iteration.

Which gives:

CREATE FUNCTION get_lcs_length(
  in_str1 IN NVARCHAR2,
  in_str2 IN NVARCHAR2
) RETURN NUMBER
IS
  TYPE numbers_array IS TABLE OF NUMBER(4,0);
  TYPE matrix IS TABLE OF numbers_array;
  c_l1         CONSTANT NUMBER(4,0) := LENGTH( in_str1 );
  c_l2         CONSTANT NUMBER(4,0) := LENGTH( in_str2 );
  p_matrix              matrix;
  p_row                 NUMBER(1,0) := 1;
  p_prev_row            NUMBER(1,0) := 2;
  p_c1                  NCHAR(1);
  p_c2                  NCHAR(1);
  p_prev_value          NUMBER(4,0);
BEGIN
  IF in_str1 IS NULL OR in_str2 IS NULL THEN
    RETURN NULL;
  ELSIF c_l1 > c_l2 THEN
    RETURN get_lcs_length( in_str2, in_str1 );
  END IF;
  p_matrix := matrix( numbers_array(), numbers_array() );
  p_matrix(1).EXTEND( c_l1 );
  p_matrix(2).EXTEND( c_l1 );
  FOR x IN 1 .. c_l1 LOOP
    p_matrix(p_prev_row)(x) := 0;
  END LOOP;

  FOR y IN 1 .. c_l2 LOOP
    p_c2 := SUBSTR( in_str2, y, 1 );
    p_prev_value := 0;
    FOR x IN 1 .. c_l1 LOOP
      p_c1 := SUBSTR( in_str1, x, 1 );
      IF p_c1 = p_c2 THEN
        IF x = 1 OR y = 1 THEN
          p_prev_value := 1;
        ELSE
          p_prev_value := p_matrix(p_prev_row)(x-1) + 1;
        END IF;
        p_matrix(p_row)(x) := p_prev_value;
      ELSE
        p_prev_value := GREATEST( p_matrix(p_prev_row)(x), p_prev_value );
        p_matrix(p_row)(x) := p_prev_value;
      END IF;
    END LOOP;
    IF p_row = 1 THEN
      p_row      := 2;
      p_prev_row := 1;
    ELSE
      p_row      := 1;
      p_prev_row := 2;
    END IF;
  END LOOP;
  RETURN p_prev_value;
END get_lcs_length;
/
MT0
  • 143,790
  • 11
  • 59
  • 117
  • Thank you for your response. I tried the above logic it but unfortunately complexity of the logic is almost the same. I tried for my 28,000 records as mentioned above it still takes 4 secs which is costly for my case. Kindly suggest . – Sanjukta Dey Jan 09 '19 at 14:14
  • I tried the function above, but I see it found the number of common symbols, not the Longest Common Subsequence... – Любовь Пономарева Aug 31 '21 at 21:18
  • @ЛюбовьПономарева That is what the question asked for. – MT0 Aug 31 '21 at 21:23