I have a column called 'website' from which I require to extract the domain part only
Examples
- www.google.com -> google.com
- www.google.com/field -> google.com
- https://www.nearbyplaces.com -> nearbyplaces.com
- http://hcc.ca -> hcc.ca
- http://hcc.ca/info -> hcc.ca
- http://hcc.ca/ -> hcc.ca
What I have done so far:
select distinct website,
CASE WHEN website like '%//www.%' THEN REPLACE(REGEXP_SUBSTR(website,'//[^/\\\,=@\\+]+\\.[^/:;,\\\\\(\\)]+'),'//www.','')
WHEN website like '%//%' THEN REPLACE(REGEXP_SUBSTR(website,'//[^/\\\,=@\\+]+\\.[^/:;,\\\\\(\\)]+'),'//','')
WHEN website is null then null
WHEN website like '%www.%' THEN REPLACE(REGEXP_SUBSTR(website,'.([^/]*)'),'www.','')
else website
end as domain
Two Things: I am certain that I am missing out on some of the test cases. Secondly, I want to optimise the verbose in this solution. Any help in improving my snippet is appreciated. (FYI I am using Redshift )