How do I grab all the content from within [url] including square brackets and match group 1 and 2

Question

I have this regular expression

/\[url=(?:&quot;)?(.*?)(?:&quot;)?\](.*?)\[\/url\]/mi

and these blocks of text

[url=/someurl?page=5#3467]First[/url][postquote=true]
[url=/another_url/who-is?page=4#3396] Second[/url]
Some text[url=/another_url/who-is?page=3][i]3[/i] Third [/url]

and the regex works great at extracting the urls and text between the urls

Match 1

1.  /someurl?page=5#3467
2.  First

Match 2

1.  /another_url/who-is?page=4#3396
2.  Second

Match 3

1.  /another_url/who-is?page=3
2.  [i]3[/i] Third

The problem happens when I use the same regex from above to try to extract the url from this text

This is some text [url=https://www.somesite.com/location/?opt[]=apples]Link Name[/url]

Match 1

1.  https://www.somesite.com/location/?opt[
2.  =apples]Link Name

Notice the =apples] in the second match. What I need is the matched first match to include that in the url like

https://www.somesite.com/location/?opt[]=apples
Link Name

I have tried many modifications to this regex and no go so far, any help would be appreciated.

If it is RoR, see [BBCode for Ruby on Rails](https://stackoverflow.com/questions/1506002/bbcode-for-ruby-on-rails). — Wiktor Stribiżew, Sep 18 '17 at 20:45
yes ruby on rails and I am using BBCode but this question is just plain regex though — Matt Elhotiby, Sep 18 '17 at 20:45
@WiktorStribiżew - thats great but I need match 1 and 2 and you have 1 2 and 3. Can you remove one please? — Matt Elhotiby, Sep 18 '17 at 21:24
@Trace It is impossible. The second group is technical to ensure recursion. — Wiktor Stribiżew, Sep 18 '17 at 21:24
@WiktorStribiżew - Worked, if you want to add this to a question I can accept — Matt Elhotiby, Sep 18 '17 at 21:31
Just thought: what if you have `]...[` in the string? It won't work then. Well, maybe Casimir's approach is worth more attention. — Wiktor Stribiżew, Sep 18 '17 at 21:34

Casimir et Hippolyte · Accepted Answer · 2017-09-18T20:55:12.323

1

Ruby regex has the duplicate named capture feature. With this feature, you can handle the two cases easily (the one with &quote; and the other). You don't have to use a recursive pattern since I doubt that [] can be nested in the query part of a url:

/\[url=(?:&quote;(?<url>[^&]*(?:&(?!quote;)[^&]*)*)&quote;|(?<url>[^\s\]\[]*(?:\[\][^\s\]\[]*)*))\](?<text>.*?)\[\/url\]/mi

the url is in the named group url and the content between tags is in the named group text.

in a more readable format:

/

\[url=
(?:
    &quote; (?<url> [^&]* (?:&(?!quote;)[^&]*)* ) &quote;
  |
    (?<url> [^\s\]\[]* (?:\[\][^\s\]\[]*)* )
)
\]
(?<text>.*?)\[\/url\]

/mix

edited Sep 18 '17 at 20:55

answered Sep 18 '17 at 20:53

Casimir et Hippolyte

88,009
5
94
125

this works good but I do not need the `text` and `url` I just need 1 and 2 matched like my example – Matt Elhotiby Sep 18 '17 at 20:54
can you help remove the text and url and only have match 1 and 2 like my example? – Matt Elhotiby Sep 18 '17 at 21:03

How do I grab all the content from within [url] including square brackets and match group 1 and 2

1 Answers1