4

Imagine I have a table like

Name
----
ABCDEFG
ABChello world
ABCDEfoo
ABbar
ABCDEF
ABCDEFGHIJKLMNOP
zzz
qABCD
ABCqqqGH
ABCABC

I want to do a query and figure out how many characters of each string match the desired string "ABCDEFGHIJ," always starting from the beginning. That is...

Name               MatchingLength
----               ----
ABCDEFG            7
ABChello world     3
ABCDEzoo           5
ABbar              2
ABCDEF             6
ABCDEFGHIJKLMNOP   10
zzz                0
qABCD              0
ABCqqqGH           3
ABCABC             3

Is there any way to do this cleanly in Oracle? I'm at a loss.

StilesCrisis
  • 15,972
  • 4
  • 39
  • 62

6 Answers6

4

Don't know about "clean", but here are two solutions.

-- The hardcoded, bad performance. No transformation of your string though.
with patterns as (
        select substr('ABCDEFGHIJ', 1, rownum) txt
        from dual 
        connect by level <= length('ABCDEFGHIJ')
    )
select d.txt, coalesce(max(length(p.txt)), 0)
from dummy d
    left join patterns p
        on instr(d.txt, p.txt) = 1
group by d.txt
order by 2 desc;

-- The cool one with regex. 
-- Though transforming your input string, 
-- this can also be done with ease making something that transorms it for you
-- like in the previous example, more complicated task than the previous,
-- as oracle sucks with string manipulation. You can however write it in java.
select d.txt, coalesce(LENGTH(REGEXP_SUBSTR(d.txt, '^A(B(C(D(E(F(G(H(I(J)?)?)?)?)?)?)?)?)')), 0)
from dummy d;

http://www.sqlfiddle.com/#!4/85ba6/23

UPDATE

with patterns as (
        select substr('ABCDEFGHIJ', 1, rownum) txt
        from dual 
        connect by level <= length('ABCDEFGHIJ')
    )
select d.txt, coalesce(max(length(p.txt)), 0)
from dummy d
    left join patterns p
        on instr(d.txt, p.txt) = 1
where d.txt LIKE substr('ABCDEFGHIJ', 1, 1) || '%'
group by d.txt
order by 2 desc;

Updated fiddle: http://www.sqlfiddle.com/#!4/37400/6

Query plan produced tested on oracle 10g

SELECT STATEMENT, GOAL = ALL_ROWS       
 SORT ORDER BY                  
  SORT GROUP BY NOSORT          
   NESTED LOOPS OUTER           
    INDEX RANGE SCAN    I <<<< Uses the index.
    VIEW                            
     COUNT                  
      CONNECT BY WITHOUT FILTERING                  
       FAST DUAL                    
Roger
  • 2,912
  • 2
  • 31
  • 39
  • why not just `count(p.txt)`? – Kirill Leontev Mar 21 '13 at 10:10
  • @be here now It works, but not really what I semantically mean to do. Even if it has better performance OP should use the regex approach IMO. – Roger Mar 21 '13 at 14:45
  • The regex approach would require me to write ancillary C++ code to convert the input string into a regex. Also my experience with regex in Oracle 10 has been that it is phenomenally slow. – StilesCrisis Mar 21 '13 at 16:57
  • I gave this approach a shot on our actual production data and it ended up performing a full table scan, unfortunately (the name column is indexed). I also tried converting `instr(d.txt, p.txt) = 1` to `d.txt like p.txt || '%'` and this also failed to dodge the full table scan. – StilesCrisis Mar 21 '13 at 17:10
  • Yes, your index will break. try `d.txt like substr('ABCDEFGHIJ', 1, 1) || '%'`. Note that when I talk about 'ABCDEFGHIJ', I mean your inputstring which should be a placeholder. – Roger Mar 21 '13 at 17:58
  • @StilesCrisis I think the outer join is no longer necessary either. – Roger Mar 21 '13 at 21:44
  • Not bad. You managed to hit the index here. I'm going to award the answer. Thanks! – StilesCrisis Mar 21 '13 at 22:11
0

Assuming that you will want to have the matching count of only those strings which are starting with ABCDEFGHIJ and for the strings like qABCD the matching count will be 0

 SELECT STR,DECODE(SUBSTR(STR,1,1),'A',LENGTH(STR)-
 NVL(LENGTH(REPLACE(TRANSLATE(STR,'ABCDEFGHIJ',' '),' ','')),0),0) MATCHING_LENGTH FROM table
Aspirant
  • 2,238
  • 9
  • 31
  • 43
  • This doesn't work for inputs such as 'ABqCD', the `TRANSLATE` function will remove the 'CD' as well – A.B.Cade Mar 21 '13 at 06:57
0

If you're using Oracle 11gR2 then you can use Recursive Common Table Expressions like this:

with rcte(txt,t, p, c) as
(
  select d.txt , d.txt t, 'ABCDEFGHIJ' p, 0 c  
  from dummy d
  union all
  select txt ,substr(t, 2), substr(p, 2), case when substr(t, 1, 1) = substr(p, 1, 1) then 1 else 0 end 
  from rcte 
  where length(t) > 0  and length(p) > 0  and substr(t, 1, 1) = substr(p, 1, 1)
  )
select txt, sum(c) from rcte
group by txt;

Here is a sqlfiddle demo (thanks to @Roger)

A.B.Cade
  • 16,735
  • 1
  • 37
  • 53
0

Assuming the matching criteria is both symbol and position equality (value of pattern ABCqqqGH is 5) you might try this:

17:57:03 SYSTEM@sandbox> @sf test

VAL
------------------------------
ABCDEFG
ABChello world
ABCDEzoo
ABbar
ABCDEF
ABCDEFGHIJKLMNOP
zzz
qABCD

8 rows selected.

Elapsed: 00:00:00.01
17:57:05 SYSTEM@sandbox> @get match
  1  select t.val, count(l)
  2    from test t
  3    left join (select level l from dual connect by level <= length('ABCDEFGHIJ')) i
  4      on substr(t.val, i.l, 1) = substr('ABCDEFGHIJ', i.l, 1)
  5   group by t.val
  6*  order by 2 desc
17:57:07 SYSTEM@sandbox> /

VAL                              COUNT(L)
------------------------------ ----------
ABCDEFGHIJKLMNOP                       10
ABCDEFG                                 7
ABCDEF                                  6
ABCDEzoo                                5
ABChello world                          3
ABbar                                   2
zzz                                     0
qABCD                                   0

8 rows selected.

Elapsed: 00:00:00.02
Kirill Leontev
  • 10,641
  • 7
  • 43
  • 49
0
declare
       v_1 number := 0;
v_pattern VARCHAR(26) := '&n';
v_f number;
v_spaces VARCHAR(30) := ' ';
v_l number;
v_c varchar(20) := ' ';
v_n varchar(20) := ' ';
    BEGIN
 v_f := Ascii(SubStr(v_pattern,1,1));
v_l := Ascii(SubStr(v_pattern,Length(v_pattern)));
v_spaces := LPad(' ',Length(v_pattern),' ');
for i in (select str,TRANSLATE(REPLACE(str,' ',''),v_pattern,v_spaces) c1,length(REPLACE(str,' ',''))-nvl(length(replace(TRANSLATE(REPLACE(str,' ',''),v_pattern,v_spaces),' ','')),0) c2 from table 
where ascii(substr(str,1,1)) IN (SELECT DISTINCT Ascii(SubStr(v_pattern,LEVEL,1)) FROM dual CONNECT BY LEVEL<=Length(v_pattern))) loop
       for j in 1..i.c2 loop
               v_c :=instr(i.c1,' ',1,j);
               v_n :=instr(i.c1,' ',1,j+1);
               if v_c+1=v_n then
                          v_1 := v_1+1;
               end if;
       end loop;
                  if(v_1+1 = i.c2) then
               dbms_output.put_line('String : '||i.str||' and Matching count : '||i.c2);
       else
               dbms_output.put_line('String : '||i.str||' and Matching count : '||((v_1)-1));
       end if;
   v_1 := 0;
end loop;
FOR k IN (SELECT str FROM table WHERE NOT(Ascii(substr(str,1,1)) IN (SELECT DISTINCT Ascii(SubStr(v_pattern,LEVEL,1)) FROM dual CONNECT BY LEVEL<=Length(v_pattern)))) LOOP
      dbms_output.put_line('String : '||k.str||' and Matching count : '||v_1);
 END LOOP;
 end;
Aspirant
  • 2,238
  • 9
  • 31
  • 43
-1

I hope the following helps:

CREATE TABLE TESTME ( TNAME VARCHAR2(30));

INSERT INTO TESTME VALUES('ABCDEFG');
INSERT INTO TESTME VALUES('ABChello world');
INSERT INTO TESTME VALUES('ABCDEzoo');
INSERT INTO TESTME VALUES('ABbar');
INSERT INTO TESTME VALUES('ABCDEF');
INSERT INTO TESTME VALUES('ABCDEFGHIJKLMNOP');
INSERT INTO TESTME VALUES('zzz');
INSERT INTO TESTME VALUES('qABCD');  

CREATE OR REPLACE FUNCTION GET_MLENGTH( P_INPUT VARCHAR2)
 RETURN NUMBER 
IS
 -- COMBARING STRING
 C VARCHAR2(10) := ('ABCDEFGHIJ');
 N NUMBER := 0; 
BEGIN 
  FOR I IN 1..LENGTH(P_INPUT) LOOP
   IF SUBSTR(P_INPUT,I,1) = SUBSTR(C,I,1) THEN
    N := N + 1;
   ELSE
    RETURN N; 
   END IF;  
  END LOOP;
RETURN N;   
END;
/


SELECT TNAME , GET_MLENGTH(TNAME) FROM TESTME ;

TNAME                          GET_MLENGTH(TNAME)
------------------------------ ------------------
ABCDEFG                                         7
ABChello world                                  3
ABCDEzoo                                        5
ABbar                                           2
ABCDEF                                          6
ABCDEFGHIJKLMNOP                               10
zzz                                             0
qABCD                                           0
ozOracle
  • 67
  • 4