-1

I have strings in vertica database like:

'a18: 2, b34: 5, n29: 10'

and i need to extracts digits from it and get sum. So the output should be:

17 

(calculated 2+5+10)

I need it for working in Tableau. How can i do it?

Chick Chirik
  • 115
  • 1
  • 10
  • Do it in the application that is using the database instead of trying to do it in SQL. Or restructure your data so each value is in its own row in a table instead of having a bunch in a string. – Shawn Nov 04 '18 at 10:01
  • What are your column names? Vertical databases can be normalised using the union operator. – h33 Nov 04 '18 at 10:05
  • @shawn i can't restructure data because i can just make selects to database. And also there can be different amount of 'keys' in this string: down vote favorite I have strings in vertica database like: 'a18: 2, b34: 5, n29: 10' or down vote favorite I have strings in vertica database like: 'a18: 2, b34: 5, n29: 10' or 'a18: 1, b34: 19' – Chick Chirik Nov 04 '18 at 10:24
  • @h33 this column name is 'category' but what does it change? this string is stored in one columns – Chick Chirik Nov 04 '18 at 10:25
  • @ChickChirik . . . You should fix your data model so you are not storing lists of things in a string column. That is just poor database design. – Gordon Linoff Nov 04 '18 at 11:50
  • @GordonLinoff but still maybe some regular expression would help me? – Chick Chirik Nov 04 '18 at 15:37
  • @ChickChirik . . . As far as I know, Vertica does not support arrays. Nor does it have the ability to extract parts of a string into a separate table (i.e. table-valued functions). You should fix the data model. – Gordon Linoff Nov 04 '18 at 16:26

2 Answers2

1

While modifying the data model will make your life a lot easier, Your scenario can be handled using Tableau calculations alone.

  1. Replace ':' by ','. This can be done using following calculation.

enter image description here

  1. Get your total using following calculation. (This assumes maximum 5 key,value pairs. Adjust if required)
ZN(INT((TRIM( SPLIT( [Formatted String], ",", 2 ) ))))+
ZN(INT((TRIM( SPLIT( [Formatted String], ",", 4 ) ))))+
ZN(INT((TRIM( SPLIT( [Formatted String], ",", 6 ) ))))+
ZN(INT((TRIM( SPLIT( [Formatted String], ",", 8 ) ))))+
ZN(INT((TRIM( SPLIT( [Formatted String], ",", 10 ) ))))

Here ZN replaces NULL with zero so that we dont have to worry about missing splits. We are simply taking alternate fields and converting them to integers. enter image description here

Here is the final result:

enter image description here

Jose Cherian
  • 7,207
  • 3
  • 36
  • 39
0

Two steps, using the SPLIT_PART function, I think:

a) Verticalise your string into rows.

b) split your resulting strings by the colon, cast the token after the colon as an integer, and make a sum of those integers.

WITH  
-- this is your single-row input
-- if you have many rows, you will have to add any grouping column here                                                                                                                                                   
input(s) AS (
SELECT 'a18: 2, b34: 5, n29: 10'
)
,
-- start real WITH clause here ...
i(i) AS (
-- create 10 "index" integers out of nothing using TIMESERIES ...
 SELECT
   MICROSECOND(ts) AS i
 FROM (    SELECT TIMESTAMPADD(MICROSECOND,     1, '2000-01-01')
 UNION ALL SELECT TIMESTAMPADD(MICROSECOND,    10, '2000-01-01')
 ) limits(ts_lim)
 TIMESERIES ts AS '1 MICROSECOND' OVER (ORDER BY ts_lim)
)
,
-- verticalise your comma separated string
rows_obtained AS (
SELECT
  TRIM(SPLIT_PART(s,',',i)) AS row_obtained
FROM input CROSS JOIN i
WHERE SPLIT_PART(s,',',i) <> '' -- remove empty strings
--   you get:
--+   row_obtained
--+   a18: 2
--+   b34: 5
--+   n29: 10
)
-- get the sum of the second part of SPLIT_PART() by colon, cast to integer
SELECT
  SUM(SPLIT_PART(row_obtained,':',2)::INT) AS the_sum
FROM rows_obtained
;
marcothesane
  • 6,192
  • 1
  • 11
  • 21