0

I have a TERADATA dataset that resembles the below :

'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor Time: 15:25:23 04/06/2020'

I want to extract tag value from the above based on the key.

Ex:

with comm as 
(
select  'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor' as text
)
select regexp_substr(comm.text,'[^: ]+',1,4)
 from comm where regexp_substr(comm.text,'[^: ]+',1,3) = 'IssueType';

Is there a way to query without having to change the position arguments for every tag. Also I am finding the last field a little tricky with date & time fields.

Any help is appreciated.

Thank you.

pats4u
  • 177
  • 6
  • 18
  • What is the exact result to be returned? – dnoeth Apr 07 '20 at 11:38
  • Thank for you the question. If the above can be broken into key & value down like below with a regular expression without having to provide positional parameters, it will help . ``` Key Value ============== =================== Project Hercules IssueType Improvement Components core AffectsVersions 2.4.1 Priority Minor Time 15:25:23 04/06/2020 ``` – pats4u Apr 07 '20 at 11:44

1 Answers1

1

There's the NVP function to access Name/Value-pair data, but to split into multiple rows you need either strtok_split_to_table or regexp_split_to_table. The tricky part in your case are the delimiters, would be easier if they were unique instead of ' 'and ':':

WITH comm AS 
 (
   SELECT 1 as keycol, -- should be a key column in your table, either numeric or varchar
      'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor Time: 15:25:23 04/06/2020' AS text
 )
SELECT id, tokennum, token, 
   -- get the key
   StrTok(token,':', 1) AS "Key",
   -- get the value (can't use StrTok because of ':' delimiter)
   Substring(token From Position(': ' IN token)+2) AS "Value"
FROM TABLE
 ( RegExp_Split_To_Table(comm.keycol
                         ,comm.text
                         ,'( )(?=[^ ]+: )' -- assuming names don't contain spaces: split at the last space before ': '
                         , 'c') 
RETURNS (id INT , tokennum INTEGER, token VARCHAR(1000) CHARACTER SET Latin)) AS dt
dnoeth
  • 59,503
  • 4
  • 39
  • 56