3

I have the following situation where I need to solve the question of displaying data inside a column, for each data-part of a concatenated string.

I'm relatively new to this and the sad thing is I was not able to transfer other already read examples into my current situation. I've read something about crosstab and pivot but I have only one column that holds that data and I don't know how much comma-separated values will be inside the column, so it needs to be dynamic anyhow and I don't know how to solve it with this.

So lets say we have a postgres table like this (version 9.6.3 on Amawon Aws - sadly they don't support some things like arrays):

CREATE TABLE touchpoints (
customer_id varchar(50) PRIMARY KEY,
purchase_timestamp date,
touchpoints_till_purchase text,
purchase_quantity int,
purchase_value int);

Let's fill it with data like I have it inside my view:

INSERT INTO touchpoints (customer_id, purchase_timestamp, touchpoints_till_purchase, purchase_quantity, purchase_value) 
VALUES ('testuser0@example.com', 'Jun 01, 2018 11:12', 'SEO,Direct', 2, 180),('testuser1@example.com', 'Jun 03, 2018 15:56', 'Direct,Facebook_Paid,SEO', 1, 100),
('testuser2@example.com', 'Jun 04, 2018 21:44', 'Direct,Direct,Direct,Direct,Direct,Direct,Direct,Direct', 3, 170),('testuser3@example.com', 'Jun 06, 2018 15:42', 'Direct,SEO,SEO,Direct', 5, 270),
('testuser4@example.com', 'Jun 06, 2018 15:42', 'Direct,Direct,Direct,Direct,Direct,Direct,Direct', 5, 270);

The created table should look like this afterward, that's how my actual table looks like too.

actual situation:

actual situation

So I am trying to set up a query or function that's splitting the values inside the "touchpoints_till_purchase" field into separate columns like in a pivot.

What I am trying to build up is something like that:

expected result:

expected result

Should be not that hard I thought but I can't solve it with my actual low skills in PostgreSQL. All the solutions I watched and tried didn't to the trick or I used them false I guess.

The expected query should keep a few things in mind:

  • I don't know how many comma-separated values will be inside the touchpoints column that should get splitted
  • I don't have a corresponding column that is labeling the touchpoints (like counting them in a loop or something)
  • It needs to be dynamically and it should run on Amazon AWS --> sadly there are a few unsupported things , you can keep a look at here: https://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-functions.html
    • I can't use alternatives like Json format because i need to solve it with separate columns

As far as i tried to solve this many many hours and didn't find a solution i would really looking forward for some help of sql professionals like you.

Many many thanks in advance!

Jasen
  • 11,837
  • 2
  • 30
  • 48
Traction
  • 31
  • 3
  • are you sure 9.6.3 - I hope you're right, amazon docs say 8.0 and regexp `(?:` is not supported in that :( which would make it harder to solve but still not impossible) – Jasen Jul 29 '18 at 21:35
  • 1
    To get the number of values: `length(touchpoints_till_purchase) - length(translate(touchpoints_till_purchase,',','')) + 1`, example: `select length('a,b,c') - length(translate('a,b,c',',','')) + 1;`. To get the particular value: `split_part(touchpoints_till_purchase,',',)`, example: `select split_part('a,b,c',',',2);` – Abelisto Jul 29 '18 at 22:57
  • @Jasen `SELECT VERSION();` outputs: `PostgreSQL 9.6.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit` and aurora (AWS) version: `SELECT AURORA_VERSION();` outputs: `1.0.11` – Traction Jul 30 '18 at 13:24
  • @Abelisto How can i pack this inside a dynamic split_part or something? – Traction Jul 30 '18 at 13:26
  • It is impossible using pure SQL - there is no possibility to make a query returning dynamic number of columns. There are several ways to emulate it (ref cursors, temporary views etc). But my suggestion is: if it possible - do it at the client side. – Abelisto Jul 30 '18 at 23:33

1 Answers1

1

One approach that should work is to use the regexp version of substring, but it's not pretty.

SELECT customer_id,purchase_timestamp
   ,substring (touchpoints_till_purchase from '^([^,]*)') as touchpoint1
   ,substring (touchpoints_till_purchase from '^(?:(?:[^,]*),){1}([^,]*)') as touchpoint2
   ,substring (touchpoints_till_purchase from '^(?:(?:[^,]*),){2}([^,]*)') as touchpoint3
   ,substring (touchpoints_till_purchase from '^(?:(?:[^,]*),){3}([^,]*)') as touchpoint4
   ,substring (touchpoints_till_purchase from '^(?:(?:[^,]*),){4}([^,]*)') as touchpoint5
   ,substring (touchpoints_till_purchase from '^(?:(?:[^,]*),){5}([^,]*)') as touchpoint6
   ,substring (touchpoints_till_purchase from '^(?:(?:[^,]*),){6}([^,]*)') as touchpoint7
   ,substring (touchpoints_till_purchase from '^(?:(?:[^,]*),){7}([^,]*)') as touchpoint8
   ,purchase_quantity
   ,purchase_value
FROM touchpoints;

thgis one might also work.

SELECT customer_id,purchase_timestamp
   ,substring (touchpoints_till_purchase from '^(?:([^,]*)(,|$)){1}') as touchpoint1
   ,substring (touchpoints_till_purchase from '^(?:([^,]*)(,|$)){2}') as touchpoint2
   ,substring (touchpoints_till_purchase from '^(?:([^,]*)(,|$)){3}') as touchpoint3
   ,substring (touchpoints_till_purchase from '^(?:([^,]*)(,|$)){4}') as touchpoint4
   ,substring (touchpoints_till_purchase from '^(?:([^,]*)(,|$)){5}') as touchpoint5
   ,substring (touchpoints_till_purchase from '^(?:([^,]*)(,|$)){6}') as touchpoint6
   ,substring (touchpoints_till_purchase from '^(?:([^,]*)(,|$)){7}') as touchpoint7
   ,substring (touchpoints_till_purchase from '^(?:([^,]*)(,|$)){8}') as touchpoint8
   ,purchase_quantity
   ,purchase_value
FROM touchpoints;

however if Amazons substring uses the 8.0 regexp engine it gets worse.

SELECT customer_id,purchase_timestamp
   ,substring (substring (touchpoints_till_purchase from '^(([^,]*)(,|$)){1}') from '([^,]*),?$') as touchpoint1
   ,substring (substring (touchpoints_till_purchase from '^(([^,]*)(,|$)){2}') from '([^,]*),?$') as touchpoint2
   ,substring (substring (touchpoints_till_purchase from '^(([^,]*)(,|$)){3}') from '([^,]*),?$') as touchpoint3
   ,substring (substring (touchpoints_till_purchase from '^(([^,]*)(,|$)){4}') from '([^,]*),?$') as touchpoint4
   ,substring (substring (touchpoints_till_purchase from '^(([^,]*)(,|$)){5}') from '([^,]*),?$') as touchpoint5
   ,substring (substring (touchpoints_till_purchase from '^(([^,]*)(,|$)){6}') from '([^,]*),?$') as touchpoint6
   ,substring (substring (touchpoints_till_purchase from '^(([^,]*)(,|$)){7}') from '([^,]*),?$') as touchpoint7
   ,substring (substring (touchpoints_till_purchase from '^(([^,]*)(,|$)){8}') from '([^,]*),?$') as touchpoint8
   ,purchase_quantity
   ,purchase_value
FROM touchpoints;

A solution using disallowed features like arrays is much tidier.

Jasen
  • 11,837
  • 2
  • 30
  • 48
  • `split_part()`? – Abelisto Jul 29 '18 at 23:00
  • Thanks for your answer @Jason. It's pretty well for static value counts inside the touchpoints column. But my main problem is that i don't know how much values will be (comma-separated) iniside it. I had cases where i have more than 30 values inside. Is there any dynamic solution ? So that it is working for "n" columns too ? Where "n" is variable Thanks for your fast answer =) – Traction Jul 30 '18 at 13:28
  • 1
    postgresql doen't do variable numbers of columns from a static query. your best approach may be to split the list into columns in your application instead of asking the database to do it. – Jasen Jul 31 '18 at 04:29
  • @Abelisto split_part() , is much nicer – Jasen Jul 31 '18 at 04:32