I have a table that has a column that holds an html string that may or may not be well formed (tried to use xmltable route and it didn't work) which is why I am trying to use the following sql syntax. I am trying to create a query that uses REGEXP_SUBSTR, LEVEL and CONNECT BY LEVEL based on examples I found online but I am not able to get the results I would expect:
I created an example query below:
with qry as (
select 1 as id, '<div>What would you like to do today?<a href="#" data-jump="111">Client Service</a><a href="#" data-jump="112">Customer Service</a><a href="#" data-jump="113">Tecnical Service</a></div>' as html_string from dual
union
select 2 as id, '<div>What would you like to do today?<a href="#" data-jump="111">Client Service</a><a href="#" data-jump="112">Customer Service</a><a href="#" data-jump="113">Tecnical Service</a></div><a href="#" data-jump="114">Other Service</a></div>' as html_string from dual
)
SELECT ID,
REGEXP_SUBSTR(html_string, '<a.*?>(.*?).</a>', 1, LEVEL, NULL, 1) as contents,
REGEXP_SUBSTR(html_string, 'data-jump="(.*?)"', 1, LEVEL, NULL, 1) as data_jump
FROM qry
CONNECT BY LEVEL <= REGEXP_COUNT(html_string, '<a.*?>(.*?).</a>')
By running this query, I would expect 7 rows to come back:
ID | Contents | Data_Jump |
---|---|---|
1 | Client Servic | 111 |
1 | Customer Servic | 112 |
1 | Tecnical Servic | 113 |
2 | Other Servic | 114 |
2 | Tecnical Servic | 113 |
2 | Other Servic | 114 |
2 | Customer Servic | 112 |
Instead its coming back with 22 rows.
ID | Contents | Data_Jump |
---|---|---|
1 | Client Servic | 111 |
1 | Customer Servic | 112 |
1 | Tecnical Servic | 113 |
2 | Other Servic | 114 |
2 | Tecnical Servic | 113 |
2 | Other Servic | 114 |
2 | Customer Servic | 112 |
1 | Tecnical Servic | 113 |
2 | Other Servic | 114 |
2 | Tecnical Servic | 113 |
2 | Other Servic | 114 |
2 | Client Servic | 111 |
1 | Customer Servic | 112 |
1 | Tecnical Servic | 113 |
2 | Other Servic | 114 |
2 | Tecnical Servic | 113 |
2 | Other Servic | 114 |
2 | Customer Servic | 112 |
1 | Tecnical Servic | 113 |
2 | Other Servic | 114 |
2 | Tecnical Servic | 113 |
2 | Other Servic | 114 |
I don't know where the problem is that I am getting duplicates. On the actual production table there are a lot of rows so the query just spins and never returns. I suspect its because a lot of duplicates are being generated. Can anyone tell me how to fix the above query so that I can try to transport it onto my actual query?
Thanks,