0
select regexp_replace('https://www.facebook.com/cricket/hello', '.*\..*?\/', '')

The above code is giving me

hello

instead of

cricket/hello

I checked on Regexp checking website and the pattern is correct. I am not sure where am I going wrong.

DBMS: "PostgreSQL 8.2.15 (Greenplum Database 4.2.8.3 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 2 2014 01:33:14"

Ashish
  • 131
  • 9

3 Answers3

1

I don't know how, but this worked

.*?\.[a-z]+\/

Taking Andrew Wolfe's query on weirdest kinds of URLs.

select testval, regexp_replace ( testval,  '.*?\.[a-z]+\/',  '')
from (
    select 'https://www.facebook.com/cricket/hello' as testval
  union all
  select 'http://a.b.co.uk/cric.ke.t/hello' as testval
  union all
  select 'ftp://a.b.com.d.e.f/relroot/cricket/hello' as testval  union all
  select 'http://www.google.co.uk/cricket/hello' as testval  
  union all
  select 'http://a.b.co.uk/cricket/hello/this/is/a/little/longer?and&it=has&args' as testval
) vals

enter image description here

Ashish
  • 131
  • 9
0

Try this:

select regexp_replace('https://www.facebook.com/cricket/hello', '.*\.[a-z]+\/', '')

Also work with cctld's :

select regexp_replace('https://www.google.co.uk/cricket/hello', '.*\.[a-z]+\/', '')
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
  • 1
    @Ashish - Pedro's answer works fine for your example - are you just trying to make this work for that one example? Otherwise, I would recommend you think carefully about the requirements and update your question (or accept this answer and post a new question). btw, it looks like the issue with your code is that `regexp_replace` is matching greedy (up the the last `/`). – steve klein Apr 24 '15 at 23:32
  • I have other URLs as well like **www.google.co.uk/cricket/hello** and the answer would not work on this example. @steveklein , you are right about the greediness issue. Although I have used the non-greedy version, the `regexp_replace` is still going greedy. How can I solve this. – Ashish Apr 27 '15 at 01:43
  • I've updated my answer, give it a try. It works with any gtld or cctld. – Pedro Lobito Apr 27 '15 at 01:46
  • @PedroLobito Did not work. It still gave **hello** as the result. I believe there is some issue with GreenPlum. – Ashish Apr 27 '15 at 13:32
0

I'm assuming that you want the path part of a URL.

I don't have my pg up but I'd go very explicit with each part of the URL -

'[^:]+:\/\/[A-Za-z][-a-zA-Z0-9]*(\.[A-Za-z][-a-zA-Z0-9]*)*/'

A test:

select testval, regexp_replace ( testval,  '[^:]+:\/\/[A-Za-z][-a-zA-Z0-9]*(\.[A-Za-z][-a-zA-Z0-9]*)*/',  '')
from (
    select 'https://www.facebook.com/cricket/hello' as testval
  union all
  select 'http://a.b.co.uk/cric.ke.t/hello' as testval
  union all
  select 'ftp://a.b.com.d.e.f/relroot/cricket/hello' as testval  union all
  select 'http://www.google.co.uk/cricket/hello' as testval  
  union all
  select 'http://a.b.co.uk/cricket/hello/this/is/a/little/longer?and&it=has&args' as testval
) vals

See http://sqlfiddle.com/#!15/9eecb/857/0

Andrew Wolfe
  • 2,020
  • 19
  • 25
  • This looks great but still did not work on GreenPlum. The first four gave **hello** as results and the last one gave **longer?and&it=has&args** – Ashish Apr 27 '15 at 13:40