1

I have 2 strings for example:

  • 'Source:Siebel; Name:Mary Jane; Gender:F; Age:24; N;'
  • 'Source:Siebel; Name:Marie; Gender:F; Age:24; N;'

The result I need is:

  • Name:Mary Jane;
  • Name:Marie;

Most likely I need to reverse below code

with cte1 as  (
    select 1 id, 'Source:Siebel; Name:Mary Jane; Gender:F; Age:24; N;' str from dual
    union all
    select 2 id, 'Source:Siebel; Name:Marie; Gender:F; Age:24; N;' str from dual
), cte2 as (
    SELECT distinct id, trim(regexp_substr(str, '[^ ]+', 1, level)) str
    FROM cte1 t
    CONNECT BY instr(str, ' ', 1, level - 1) > 0
)
select distinct t1.str
from cte2 t1
join cte2 t2 on (t1.str = t2.str and t1.id != t2.id)

from Oracle Function to return similarity between strings

as the result was the similarities [QueryResult] of the 2 strings

enter image description here

I cannot use the procedure as I need this SQL script to run in Oracle Fusion

jishan siddique
  • 1,848
  • 2
  • 12
  • 23
kathnisss
  • 15
  • 4
  • 1
    Why is the data in that format to begin with? Even if it comes from some source that only outputs complex strings like that, when you import the data in your database it should first be normalized. –  Apr 10 '20 at 12:40
  • Hello, I used that sample data because the actual data I'm working with is concatenated values from Interface Table and Base tables of Oracle. And I am comparing from the results of the two to ensure that the data from Interface table was passed to Base table, – kathnisss Apr 10 '20 at 13:45
  • I am not sure I understand. The data either was passed from "interface" to "base tables" or it wasn't; I don't see how "Mary Jane" could have been changed to "Marie" in the process. Copying or transferring data from one place to another can fail in many ways, but altering the content of the data would be exceptionally rare (I think). –  Apr 10 '20 at 13:48

2 Answers2

0

Would this help?

SQL> with cte1 as  (
  2   select 1 id, 'Source:Siebel; Name:Mary Jane; Gender:F; Age:24; N;' str from dual
  3   union all
  4   select 2 id, 'Source:Siebel; Name:Marie; Gender:F; Age:24; N;' str from dual
  5   ),
  6  cte2 as
  7    (select id,
  8       column_value lvl,
  9       trim(regexp_substr(str, '[^;]+', 1, column_value)) str
 10     from cte1 cross join
 11       table(cast(multiset(select level from dual
 12                           connect by level <= regexp_count(str, ';') +1
 13                          ) as sys.odcinumberlist))
 14    )
 15  select a.str, b.str
 16  From cte2 a join cte2 b on a.id < b.id and a.lvl = b.lvl and a.str <> b.str;

STR             STR
--------------- ---------------
Name:Mary Jane  Name:Marie

SQL>
Littlefoot
  • 131,892
  • 15
  • 35
  • 57
0

The result I need is:

Name:Mary Jane; 
Name:Marie;

You could use LAG/LEAD analytical function to get your desired output.

Demo with multiple input values, e.g. 'Mary Jane', 'Marie', 'Jane', 'Jones'

with t1 as  (
    select 1 id, 'Source:Siebel; Name:Mary Jane; Gender:F; Age:24; N;' str from dual
    union all
    select 2 id, 'Source:Siebel; Name:Marie; Gender:F; Age:24; N;' str from dual
    union all
    select 3 id, 'Source:Siebel; Name:Jane; Gender:F; Age:24; N;' str from dual
    union all
    select 4 id, 'Source:Siebel; Name:Jones; Gender:F; Age:24; N;' str from dual
), t2 as (
SELECT t1.id,
        trim(regexp_substr(t1.str, '[^;]+', 1, lines.column_value)) str
    FROM t1,
      TABLE (CAST (MULTISET
      (SELECT LEVEL FROM dual
              CONNECT BY instr(t1.str, ';', 1, LEVEL) > 0
      ) AS sys.odciNumberList ) ) lines
    ORDER BY id, lines.column_value)
select id, str from(
  select id, 
         str, 
        lag(str) over(partition by str order by str) lag, 
        lead(str) over(partition by str order by str) lead from t2
) where lag is null
  and   lead is null
order by id;

        ID STR
---------- -----------------------
         1 Name:Mary Jane
         2 Name:Marie    
         3 Name:Jane     
         4 Name:Jones

This will give you the difference between any attributes in the string, name, age, gender etc. which doesn't match with other string.

Lalit Kumar B
  • 47,486
  • 13
  • 97
  • 124