teradata sql - regexp_substr to split on ' - '

Question

I am somewhat new to Teradata. I am more familiar with Presto SQL, where split_part is available.

I'm looking to split a string on a space, hyphen, space (' - ').

Example: 'Wal-Mart - Target - Best Buy - K-Mart - Staples'

I'm used to using split_part(split_part(COLUMN, ' - ',2), ' - '), 1) to get Target, which ignores the hyphens in Wal-Mart and K-Mart because the hyphen is not preceeded and followed by a space.

But, I can't figure out how to get 'Target' with Teradata.
strtok() only seems to work with a single character, which isn't sufficient since I want to split on 3 (' - ').

Any help would be appreciated!

You could perhaps use REGEXP_SPLIT_TO_TABLE which can have a multi-character pattern. But for REGEXP_SUBSTR try `'(^|(\ -\ ))\K.*?(?=($|\ -\ ))'` for the pattern. — Fred, Oct 20 '21 at 03:26
Actually the spaces don't need to be escaped unless you use matcharg 'x' so you could use the marginally simpler `'(^|( - ))\K.*?(?=($|( - )))'` — Fred, Oct 20 '21 at 18:26

Radagast · Answer 1 · 2021-10-19T23:18:09.037

1

Depending on your version (14.0 or recent), you could use strtok to parse it out

select strtok(oreplace('Wal-Mart - Target - Costco - K-Mart - Staples',' - ','|'),'|',2)

edited Oct 19 '21 at 23:18

answered Oct 19 '21 at 23:04

Radagast

5,102
3
12
27

strtok doesn't recognize the spaces it just splits on the hyphen, which doesn't work because of the hyphen in wal-mart. – acvbasql Oct 19 '21 at 23:11
@acvbasql I made a modification where I replace the 3 character combination with a pipe character to later split on that. Would that work? – Radagast Oct 19 '21 at 23:19
Good idea!. I think that will work. I'm still curious about a regex solution, but this will work for now. Thanks! – acvbasql Oct 20 '21 at 13:36

score 0 · Answer 2 · answered Oct 26 '21 at 14:25

While not a direct answer, maybe it will help with the logic so at the risk of getting flamed I'm throwing it out here anyway. With a regex you should be able to first describe your pattern in plain language to help analyze and define what you really need to get. i.e. You want the 2nd occurrence of a string that is surrounded by the pattern space-dash-space. What if the pattern is at the start or end of the line? Let's revise. You want a specified occurrence of a string that is preceded by the start of the line optionally OR by the pattern space-dash-space, and is followed by space-dash-space OR the end of the line.

In Oracle it would look like this where the first '2' in the argument list means get the 2nd occurrence of the pattern and the 2nd '2' means return the 2nd remembered group (in parenthesis). The WITH statement just sets up the data. You would have to translate this regex to Teradata.

WITH tbl(str) AS (
  SELECT 'Wal-Mart - Target - Best Buy - K-Mart - Staples' FROM dual
)
SELECT REGEXP_SUBSTR(str, '(^?| - )(.*?)( - |$)', 1, 2, NULL, 2) retailer
FROM tbl;

RETAILER
--------
Target  
1 row selected.

Play with query here

teradata sql - regexp_substr to split on ' - '

2 Answers2