1

I need to convert a string of 0s and 1s into a sequence of integers representing the 1s, similar to a page selection sequence in a print dialog.

e.g. '0011001110101' -> '3-4,7-9,11,13'

Is it possible to do this in a single SQL select (in Oracle 11g)?

I can get an individual list of the page numbers with the following:

with data as (
    select 'K1' KEY,   '0011001110101' VAL from dual
    union select 'K2', '0101000110'   from dual
    union select 'K3', '011100011010' from dual
)
select
    KEY,
    listagg(ords.column_value, ',') within group (
        order by ords.column_value
    ) PAGES
from
    data
cross join (
    table(cast(multiset(
        select level
        from   dual
        connect by level <= length(VAL)
    ) as sys.OdciNumberList)) ords
)
where
    substr(VAL, ords.column_value, 1) = '1'
group by
    KEY

But that doesn't do the grouping (e.g. returns "3,4,7,8,9,11,13" for the first value).

If I could assign a group number every time the value changes then I could use analytic functions to get the min and max for each group. I.e. if I could generate the following then I'd be set:

Key     Page    Val     Group
K1      1       0       1
K1      2       0       1
K1      3       1       2
K1      4       1       2
K1      5       0       3
K1      6       0       3
K1      7       1       4
K1      8       1       4
K1      9       1       4
K1      10      0       5
K1      11      1       6
K1      12      0       7
K1      13      1       8

But I'm stuck on that.

Anyone have any ideas, or another approach to get this?

Barney
  • 2,786
  • 2
  • 32
  • 34

3 Answers3

2

first of all let's level it:

select regexp_instr('0011001110101', '1+', 1, LEVEL) istr,
       regexp_substr('0011001110101', '1+', 1, LEVEL) strlen
FROM dual
CONNECT BY regexp_substr('0011001110101', '1+', 1, LEVEL) is not null

then the rest is easy with listagg :

with data as
 (
    select 'K1' KEY,   '0011001110101' VAL from dual
    union select 'K2', '0101000110'   from dual
    union select 'K3', '011100011010' from dual
 )
SELECT key,
       (SELECT listagg(CASE
                         WHEN length(regexp_substr(val, '1+', 1, LEVEL)) = 1 THEN
                          to_char(regexp_instr(val, '1+', 1, LEVEL))
                         ELSE
                          regexp_instr(val, '1+', 1, LEVEL) || '-' ||
                          to_char(regexp_instr(val, '1+', 1, LEVEL) +
                                  length(regexp_substr(val, '1+', 1, LEVEL)) - 1)
                       END,
                       ' ,') within GROUP(ORDER BY regexp_instr(val, '1+', 1, LEVEL))
          from dual
        CONNECT BY regexp_substr(data.val, '1+', 1, LEVEL) IS NOT NULL) val
  FROM data
lsalamon
  • 788
  • 7
  • 12
  • That's awesome! I had something working by using a recursive query based on the expanded data, but yours is much more elegant not to mention orders of magnitude faster. – Barney Apr 27 '17 at 06:09
2

Using a recursive sub-query factoring clause without regular expressions:

Oracle Setup:

CREATE TABLE data ( key, val ) AS
  SELECT 'K1', '0011001110101' FROM DUAL UNION ALL
  SELECT 'K2', '0101000110' FROM DUAL UNION ALL
  SELECT 'K3', '011100011010' FROM DUAL UNION ALL
  SELECT 'K4', '000000000000' FROM DUAL UNION ALL
  SELECT 'K5', '000000000001' FROM DUAL;

Query:

WITH ranges ( key, val, pos, rng ) AS (
  SELECT key,
         val,
         INSTR( val, '1', 1 ), -- Position of the first 1
         NULL
  FROM   data
UNION ALL
  SELECT key,
         val,
         INSTR( val, '1', INSTR( val, '0', pos ) ), -- Position of the next 1

         rng || ',' || CASE
           WHEN pos = LENGTH( val )                 -- Single 1 at end-of-string
           OR   pos = INSTR( val, '0', pos ) - 1    -- 1 immediately followed by 0
           THEN TO_CHAR( pos )
           WHEN INSTR( val, '0', pos ) = 0          -- Multiple 1s until end-of-string
           THEN pos || '-' || LENGTH( val )
           ELSE pos || '-' || ( INSTR( val, '0', pos ) - 1 ) -- Normal range
         END
  FROM   ranges
  WHERE  pos > 0
)
SELECT KEY,
       VAL,
       SUBSTR( rng, 2 ) AS rng -- Strip the leading comma
FROM   ranges
WHERE  pos = 0 OR val IS NULL
ORDER BY KEY;

Output

KEY VAL           RNG
--- ------------- -------------
K1  0011001110101 3-4,7-9,11,13
K2  0101000110    2,4,8-9
K3  011100011010  2-4,8-9,11
K4  000000000000  
K5  000000000001  12
MT0
  • 143,790
  • 11
  • 59
  • 117
  • Good that it is not oracle specific, although that's not too important to me. I just had to cast the null to a varchar2 in the base part of the query to work around Oracle bug 13876895. – Barney Apr 28 '17 at 03:27
1

Here is a slightly more efficient version of Isalamon's solution (using a hierarchical query). It is slightly more efficient because I use a single hierarchical query instead of multiple ones (in correlated subqueries), and I calculate the length of each sequence of 1's just once, in the inner query. (In fact it is calculated only once anyway, but the function call itself has some overhead.)

This version also treats inputs like '00000' and NULL correctly. Isalamon's solution doesn't, and MT0's solution does not return a row when the input value is NULL. It is not clear if NULL is even possible in the input data, and if it is, what the desired result is; I assumed a row should be returned, with the page_list NULL as well.

Optimizer cost for this version is 17, vs. 18 for Isalamon's solution and 33 for MT0's. However, optimizer cost doesn't take into account the significantly slower processing of regular expressions compared to standard string functions; if speed of execution is important, MT0's solution should definitely be tried since it may prove faster.

with data ( key, val ) as (
       select 'K1', '0011001110101' from dual union all
       select 'K2', '0101000110'    from dual union all
       select 'K3', '011100011010'  from dual union all
       select 'K4', '000000000000'  from dual union all
       select 'K5', '000000000001'  from dual union all
       select 'K6', null            from dual union all
       select 'K7', '1111111'       from dual union all
       select 'K8', '1'             from dual
     )
--  End of test data (not part of the solution); SQL query begins below this line.
select key, val,
       listagg(case when len = 1 then to_char(s_pos)
                    when len > 1 then to_char(s_pos) || '-' || to_char(s_pos + len - 1) 
               end, ',') within group (order by lvl) as page_list
from   ( select key, level as lvl, val,
                regexp_instr(val, '1+', 1, level)          as s_pos,
                length(regexp_substr(val, '1+', 1, level)) as len
         from   data
         connect by regexp_substr(val, '1+', 1, level) is not null
                and prior key = key
                and prior sys_guid() is not null
       )
group by key, val
order by key
;

Output:

KEY  VAL            PAGE_LIST
---  -------------  -------------
K1   0011001110101  3-4,7-9,11,13
K2   0101000110     2,4,8-9
K3   011100011010   2-4,8-9,11
K4   000000000000
K5   000000000001   12
K6
K7   1111111        1-7
K8   1              1
  • I ran both Isalamon and MTOs solutions from SQL query and ISalamons appeared to run about 20% faster on my actual data. Not an exhaustive test by any means. My data cannot be null or all zeros, but that is a good point. And thanks for making me go and look up what `connect by ... prior sys_guid() is not null` does :-) – Barney Apr 28 '17 at 04:04
  • Updated mine to show `NULL` rows - all it required was adding ` OR val IS NULL` to the final query. – MT0 Apr 28 '17 at 06:57