0

I am have been assigned a task to parse a string (which is essentially in XML format) and I need to extract the name of the first tag in the string

eg: string '<column><data-type>string</data-type>.............' or '<filter><condition>....</condition>...............' or

'......................'

the string keeps changing but I am only interested in the first tag, I would like to get the output like:

  • column,
  • filter,
  • query

i have tried regexp_substr(string,'^<(.+)>',1,1,null,1) and some similer variations but they don't seem to be working cosistently.

Please help.

  • As MT0 says - use XML functions on XML data. For example, your input may have a self-closing tag (whose only purpose is to carry an attribute) - like ``. Presumably you still must retrieve the property name, `column` - not the entire junk between the opening and closing angled bracket. –  Mar 18 '21 at 13:39
  • @mathguy it is safe to assume that there will not be such singular tags like , at least not in the first tag that I am looking for – kunal kavthekar Mar 18 '21 at 15:12

2 Answers2

1

If you have XML data then use a proper XML parser:

SELECT XMLQUERY( '/*/name()' PASSING XMLTYPE(value) RETURNING CONTENT ) AS tag_name
FROM   table_name

Which for the sample data:

CREATE TABLE table_name ( value CLOB );

INSERT INTO table_name ( value )
SELECT '<column><data-type>string</data-type></column>' FROM DUAL UNION ALL
SELECT '<filter><condition>....</condition></filter>' FROM DUAL UNION ALL
SELECT '<query />' FROM DUAL UNION ALL
SELECT '<has_attributes attr1="do not return this" attr2="&lt;or&gt; this" />' FROM DUAL

Outputs:

| TAG_NAME       |
| :------------- |
| column         |
| filter         |
| query          |
| has_attributes |

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
  • This is a great solution but somehow the xmltype part is not working on my device, but @Gordon Linoff's regex solution is giving me the needed output as the tags are simple. so thanks a lot again – kunal kavthekar Mar 19 '21 at 02:42
-1

You are looking for any character between the bounds -- and that includes '>'. So, just exclude the terminating character:

select regexp_substr(string,'^<([^>]+)>',1,1,null,1)
from (select '<column><data-type>string</data-type>.............' as string from dual union all
      select '<filter><condition>....</condition>...............' from dual
     ) x;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786