0

I'm trying to write a URL rewrite regex for my company's site. The URL will always start with category/.+ After that, there can be up to 5 extra tags added on. With my current regex, it always captures the .+ after category, but then adds everything after that to that capture group. Example data

/category\/(.+)(?:\/(?:page|price|shipping|sort|brand)\/(.*))*/
mysite.com/category/15000000
mysite.com/category/15000000/page/2
mysite.com/category/15000000/page/2/price/g10l20
mysite.com/category/60000000/page/2/price//shipping//brand//sort/

The outcome is always

$1 = 15000000
    //desired $1 = 15000000
$1 = 15000000/page/2
    // desired $1 = 15000000 $2 = 2
$1 = 15000000/page/2/price/g10l20
    // desired $1 = 15000000 $2 = 2 $3 = g10l20
$1 = 60000000/page/2/price//shipping//brand//sort/
    // desired $1 = 60000000 $2 = 2 $3 = "" $4 = "" $5 = "" $6 = ""

My understanding is that the zero or more quantifier would enable it to go back, and search again for the "flag" pattern, but this is apparently not the case. Could someone please tell me what I'm doing wrong?

Geoffrey H.
  • 160
  • 2
  • 17
  • Try like this `/category\/(.*?)(?:\/(?:page|price|shipping|sort|brand)\/(.*))*$/` – Wiktor Stribiżew Oct 27 '17 at 15:57
  • Well, it successfully broke the last test case into two group, `$1 = 60000000` `$2 = /page/2/price//shipping//brand//sort/` but I need each "flag" to result in it's own capture, regardless of it's empty or not. – Geoffrey H. Oct 27 '17 at 16:01
  • Ah, I see what you are after. That can only be done programmatically with a .NET, PyPi and specifically compiled Boost regex libraries. – Wiktor Stribiżew Oct 27 '17 at 16:06

1 Answers1

1

Unfortunately it's not possible to keep an indeterminate number of captures from a regex. When a capture is repeated with + * {n} etc, only the most recently captured group is returned.

As you know you'll have a maximum of 5 tags, you could just repeat the relevant block 5 times like so:

/category\/([^/]*)(?:\/(page|price|shipping|sort|brand)\/([^/]*))?(?:\/(page|price|shipping|sort|brand)\/([^/]*))?(?:\/(page|price|shipping|sort|brand)\/([^/]*))?(?:\/(page|price|shipping|sort|brand)\/([^/]*))?(?:\/(page|price|shipping|sort|brand)\/([^/]*))?/

This is ugly in the extreme, allows a tag to be repeated, and needs the regular expression to be extended if you want to add more tags.

The neatest solution is probably to capture the category ID in $1 and the rest of the argument string in $2 - you'll need to have the application parse this, where it can be done far more neatly than it can be in regex.

/category\/([^/]*)(\/.*)?/
Fahad Sadah
  • 2,368
  • 3
  • 18
  • 27
  • Thank you. I was able to condense your solution a bit to `/category\/([^/]+)(?:\/page\/([^/]?))?(?:\/price\/([^/]*))?(?:\/shipping\/([^/]*))?(?:\/brand\/([^/]*))?(?:\/sort\/([^/]*))?/` and this works as expected – Geoffrey H. Oct 27 '17 at 19:57