Hello StackOverflow world,
I have a datasource that stores all of my website session data and I am hoping to analyze the referral URLS and group them into five categories: Home Site (www.mywebsite.com), Sub Site (www.mywebsite.com/employees), Outside Traffic ,Google, Yahoo, Bing, Facebook, Linkedin, and Youtube.
In order to do this, I must parse the referral URLs. I have it partially figured out, but my current query mis-classifies URLS from outside websites that include our domain in their URL. When my query runs it will not group these URLs as Outside Traffic but create its own grouping as the name of the referral url.
For example, using this URL you can see how the name of my site is embedded in their URL:
https://www.helpthepeople.com/redirect.action?link=https%3A%2F%2F**www.mywebsite.com**%2Femployers%2Fblog%2Fwhat-to-do-when-asking-for-help%2F&encoded=lFAJCUeGqgrDkdlYfDwwbEfCqGlV
I get the following output:
www.helpthepeople.com
Desired output:
Outside Traffic
For the most part my query is working, but I am only having issues when the example above is present. Anyone know a better way to write this? My query is below:
SELECT
CASE
WHEN referrer_page LIKE '%mywebsite.com%'
THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
WHEN referrer_page LIKE '%mywebsite.com/employees%'
THEN SPLIT_PART(SPLIT_PART(referrer_page,'.com/',2),'/',1)
WHEN referrer_page LIKE '%google%'
THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
WHEN referrer_page LIKE '%yahoo%'
THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
WHEN referrer_page LIKE '%bing%'
THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
WHEN referrer_page LIKE '%facebook%'
THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
WHEN referrer_page LIKE '%linkedin%'
THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
WHEN referrer_page LIKE '%youtube%'
THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
ELSE 'outside_referral_traffic'
END AS url_grouping,
referrer_page,
session_date,
channel,
medium,
web_source,
campaign_name,
id,
COUNT (DISTINCT id) AS number_of_sessions
FROM biz_sessions
WHERE session_date >= '2019-07-01' AND session_date <= '2019-07-31'
GROUP BY
referrer_page,
session_date,
channel,
medium,
web_source,
campaign_name,
id