SQL - need help in parsing text of a field

Question

I have a select query and it fetches a field with complex data. I need to parse that data in specified format. please help with your expertise:

selected string = complexType|ChannelCode=PB - Phone In A Box|IncludeExcludeIndicator=I

expected output - PB|I

Please help me in writing a sql regular expression to accomplish this output.

heh heh heh multiple delimiter types for a single field. This is yucky. What have you tried? What specifically are you having problems with? — Kritner, Jul 24 '15 at 12:45
How do you know to take just "PB" rather than "PB - Phone In a Box" ? are the delimiters both "|" and " - " ? Can the "PB" part be multiple words? You need to better define your requirements, then it will likely be solvable by you. Right now there's not enough information given for us to solve it for you, not for edge cases anyway. — Kritner, Jul 24 '15 at 12:46
would you always want to get 2 characters after the first `=` and 1 character after the second `=`? — Vamsi Prabhala, Jul 24 '15 at 13:15
@ LittleBobbyTables - It may be out of the OP's control. You would not believe the garbage we have to deal with from our vendor's system which we cannot change. This is nothing. :-/ — Gary_W, Jul 24 '15 at 14:03

Gary_W · Answer 1 · 2015-07-24T17:02:44.423

The first step in figuring out the regular expression is to be able to describe it plain language. Based on what we know (and as others have said, more info is really needed) from your post, some assumptions have to be made.

I'd take a stab at it by describing it like this, which is based on the sample data you provided: I want the sets of one or more characters that follow the equal signs but not including the following space or end of the line. The output should be these sets of characters, separated by a pipe, in the order they are encountered in the string when reading from left to right. My assumptions are based on your test data: only 2 equal signs exist in the string and the last data element is not followed by a space but by the end of the line. A regular expression can be built using that info, but you also need to consider other facts which would change the regex.

Could there be more than 2 equal signs?
Could there be an empty data element after the equal sign?
Could the data set after the equal sign contain one or more spaces?

All these affect how the regex needs to be designed. All that said, and based on the data provided and the assumptions as stated, next I would build a regex that describes the string (really translating from the plain language to the regex language), grouping around the data sets we want to preserve, then replace the string with those data sets separated by a pipe.

SQL> with tbl(str) as (
  2    select 'complexType|ChannelCode=PB - Phone In A Box|IncludeExcludeIndicator=I' from dual
  3  )
  4  select regexp_replace(str, '^.*=([^ ]+).*=([^ ]+)$', '\1|\2') result from tbl;

RESU
----
PB|I

The match regex explained:

^        Match the beginning of the line
.        followed by any character
*        followed by 0 or more 'any characters' (refers to the previous character class)
=        followed by an equal sign
(        start remembered group 1 
[^ ]+    which is a set of one or more characters that are not a space
)        end remembered group one
.*=      followed by any number of any characters but ending in an equal sign
([^ ]+)  followed by the second remembered group of non-space characters
$        followed by the end of the line

The replace string explained:

\1       The first remembered group
|        a pipe character
\2       the second remember group

Keep in mind this answer is for your exact sample data as shown, and may not work in all cases. You need to analyse the data you will be working with. At any rate, these steps should get you started on breaking down the problem when faced with a challenging regex. The important thing is to consider all types of data and patterns (or NULLs) that could be present and allow for all cases in the regex so you return accurate data.

Edit: Check this out, it parses all the values right after the equal signs and allows for nulls:

SQL> with tbl(str) as (
  2    select 'a=zz|complexType|ChannelCode=PB - Phone In A Box|IncludeExcludeIndicator=I - testing|test1=|test2=test2 - testing' from dual
  3  )
  4  select regexp_substr(str, '=([^ |]*)( |||$)', 1, level, null, 1) output, level
  5  from tbl
  6  connect by level <= regexp_count(str, '=')
  7  ORDER BY level;

OUTPUT                    LEVEL
-------------------- ----------
zz                            1
PB                            2
I                             3
                              4
test2                         5

SQL>

SQL - need help in parsing text of a field

1 Answers1