0

Hello StackOverflow world,

I have a datasource that stores all of my website session data and I am hoping to analyze the referral URLS and group them into five categories: Home Site (www.mywebsite.com), Sub Site (www.mywebsite.com/employees), Outside Traffic ,Google, Yahoo, Bing, Facebook, Linkedin, and Youtube.

In order to do this, I must parse the referral URLs. I have it partially figured out, but my current query mis-classifies URLS from outside websites that include our domain in their URL. When my query runs it will not group these URLs as Outside Traffic but create its own grouping as the name of the referral url.

For example, using this URL you can see how the name of my site is embedded in their URL:

https://www.helpthepeople.com/redirect.action?link=https%3A%2F%2F**www.mywebsite.com**%2Femployers%2Fblog%2Fwhat-to-do-when-asking-for-help%2F&encoded=lFAJCUeGqgrDkdlYfDwwbEfCqGlV

I get the following output:

www.helpthepeople.com

Desired output:

Outside Traffic

For the most part my query is working, but I am only having issues when the example above is present. Anyone know a better way to write this? My query is below:

SELECT 
    CASE 
        WHEN referrer_page LIKE '%mywebsite.com%' 
            THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
        WHEN referrer_page LIKE '%mywebsite.com/employees%' 
            THEN SPLIT_PART(SPLIT_PART(referrer_page,'.com/',2),'/',1)
         WHEN referrer_page LIKE '%google%' 
            THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
        WHEN referrer_page LIKE '%yahoo%' 
            THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
        WHEN referrer_page LIKE '%bing%' 
            THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
        WHEN referrer_page LIKE '%facebook%'
            THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
        WHEN referrer_page LIKE '%linkedin%'
            THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
        WHEN referrer_page LIKE '%youtube%'
            THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)  
        ELSE 'outside_referral_traffic' 
    END AS url_grouping,
    referrer_page,
    session_date,
    channel,
    medium,
    web_source,
    campaign_name,
    id, 
    COUNT (DISTINCT id) AS number_of_sessions
FROM biz_sessions
WHERE session_date >= '2019-07-01' AND session_date <= '2019-07-31'
GROUP BY 
    referrer_page,
    session_date,
    channel,
    medium,
    web_source,
    campaign_name,
    id



2 Answers2

0

in SQL SERVER using replace,charindex and left

declare @string varchar(800) = 'https://www.helpthepeople.com/redirect.action?link=https%3A%2F%2Fwww.mywebsite.com%2Femployers%2Fblog%2Fwhat-to-do-when-asking-for-help%2F&encoded=lFAJCUeGqgrDkdlYfDwwbEfCqGlV'

select left(replace(replace(@string,'https://www.helpthepeople.com/redirect.action?link=',''),'https%3A%2F%2F',''),charindex('%2F',replace(replace(@string,'https://www.helpthepeople.com/redirect.action?link=',''),'https%3A%2F%2F',''))-1)

you can just easily translate this to MySQL

RoMEoMusTDiE
  • 4,739
  • 1
  • 17
  • 26
  • Would I need to declare this for every single URL or string that is considered outside traffic? I have hundreds of URLs referring traffic to the site. – AnalyzeThis Sep 16 '19 at 23:21
  • you will need to show your full data as this solution was for the older version of your OP. – RoMEoMusTDiE Sep 16 '19 at 23:23
0

Your CASE statement is exiting at the first option because it satisfies the LIKE condition of your website name being in the entire URL. I would just split the referral URL by question mark to get rid of all parameters that might possibly include your website URL and a bunch of other things. The query might even work faster.

CASE 
    WHEN SPLIT_PART(referrer_page,'?',1) LIKE '%mywebsite.com%' 
    THEN SPLIT_PART(SPLIT_PART(referrer_page,'//',2),'/',1)
    ...
END
AlexYes
  • 4,088
  • 2
  • 15
  • 23