0

Based on this example: http://example.com/cat1/tag/this%20is%20page/cat2, i need to make a regex to get /tag/<nextSegment> only if this pattern is followed or preceded by other segments, not if is alone.

The result that i need is always: http://example.com/tag/<fowardSegment> only if /tag/<fowardSegment> is preceded or followed from other segments or characters not allowed.

I tried with this regex: ((?:/[A-Za-z0-9-]+)+)?(/tag/[A-Za-z0-9-%]+)+(/.*)? but the pattern is catched also when it's alone (as you can see last example string in link).

Kouga
  • 27
  • 1
  • 9
  • I don't find your question clear. It would probably also benefit from some examples strings (matching and non-matching). – Micha Wiedenmann Feb 05 '18 at 09:01
  • I'm sorry, i added some example strings in link.. – Kouga Feb 05 '18 at 09:04
  • 1
    Please include it in the question, link content is often ignored. – Micha Wiedenmann Feb 05 '18 at 09:09
  • can you post expected results – Pavan Kumar T S Feb 05 '18 at 09:34
  • Which strings should be matched, and which ones should not? Please add these sets to the question. – Wiktor Stribiżew Feb 05 '18 at 09:46
  • I added other informations :) @WiktorStribiżew – Kouga Feb 05 '18 at 10:37
  • Try [`\b(\/tag\/[A-Za-z0-9-%]+)\/.*$`](https://regex101.com/r/G57QT0/4) – Wiktor Stribiżew Feb 05 '18 at 10:41
  • @WiktorStribiżew The second and third example strings must be catched also. Also, the first example string not catch previous segments. – Kouga Feb 05 '18 at 10:44
  • I wonder why `http://example.com/cat1/subcat3/subcat4/tag/this%20is%20pageasdasd` should be matched. There is no more subparts after `tag/`. See https://regex101.com/r/G57QT0/5 – Wiktor Stribiżew Feb 05 '18 at 10:46
  • @WiktorStribiżew 'cause in `http://example.com/cat1/subcat3/subcat4/tag/this%20is%20pageasdasd` the pattern `/tag/this%20is%20pageasdasd` is preceded by `/cat1/subcat3/subcat4`. As i wrote in thread, the result that i need is always: `http://example.com/tag/` **only if** `/tag/` **is preceded or followed from other segments or characters not allowed.** – Kouga Feb 05 '18 at 10:48
  • Ok, I'd recommend a kind of a match what you do not need and capture what you need regex. See [`^(?:https?:\/\/)?[^\/]+\/tag\/[^\/]+$|(\/tag\/[^\/]+)`](https://regex101.com/r/G57QT0/6), you will need to adjust the code for it since you only need Group 1 value. – Wiktor Stribiżew Feb 05 '18 at 10:53
  • @WiktorStribiżew You match only pattern `/tag/` but if you see my regex: [link](https://regex101.com/r/G57QT0/2) i catch all except domain name though groups (left segments before pattern `/tag/`and after) but the last string example, must not be matched cause `/tag/` is not preceded or followed by other segments or characters.. I'd recommend to see substitution part also, it can help you to understand what i need. – Kouga Feb 05 '18 at 11:00
  • Add the exception pattern at the start of your pattern, see [this fiddle](https://regex101.com/r/G57QT0/7). – Wiktor Stribiżew Feb 05 '18 at 11:10
  • @WiktorStribiżew Almost perfect! But last string example `http://example.com/tag/this` must not be matched! – Kouga Feb 05 '18 at 11:16
  • But you should not care whether or not it is matched. Only what is captured matters. What is the code? – Wiktor Stribiżew Feb 05 '18 at 11:18
  • @WiktorStribiżew I&#39;m making some redirects, so `http://example.com/tag/this`- `http://example.com/tag/this%20%is%20page`- `/tag/this`- `http://example.com (understood as domain name)` must be not considered, seems like my regex, except last string example (that is the problem) [this fiddle](https://regex101.com/r/G57QT0/2) – Kouga Feb 05 '18 at 11:26
  • Try something like https://regex101.com/r/gwK1Hg/1. – Wiktor Stribiżew Feb 05 '18 at 11:30
  • @WiktorStribiżew in your regex, you **must not match domain name and `/tag/sd-asd`** Check my example: [this fiddle](https://regex101.com/r/G57QT0/2) – Kouga Feb 05 '18 at 11:36
  • :) You won't be able to not match the domain. Unless you use .NET. – Wiktor Stribiżew Feb 05 '18 at 11:37
  • @WiktorStribiżew Why not, i make it in my example.. – Kouga Feb 05 '18 at 11:37
  • Yeah, and you match what you do not want, too. – Wiktor Stribiżew Feb 05 '18 at 11:38
  • @WiktorStribiżew yep, so isn't possible to make it? – Kouga Feb 05 '18 at 11:41

1 Answers1

0

Perhaps you want is this: /(?<=https?:\/\/.*)((?:\/[\w]+)*)((?:\/tag)(?:\/[\w%?=.-]+){2,})$/

EDIT

As I was made aware in the comments below of the extra ones that need matching:

/(?<=https?:\/\/.*|[\w_.]+)(?:((?:\/[\w]+)+)((?:\/tag)(?:\/[\w%?=.-]+))|((?:\/[\w]+))*((?:\/tag)(?:\/[\w%?=.-]+){2,}))$/

An example usage in JS:

let regex = /(?<=https?:\/\/.*|[\w_.]+)(?:((?:\/[\w]+)+)((?:\/tag)(?:\/[\w%?=.-]+))|((?:\/[\w]+))*((?:\/tag)(?:\/[\w%?=.-]+){2,}))$/,
  examples = ["http://example.com/cat1/subcat3/subcat4/tag/this%20is%20page/asdasda?start=130/asdasdasd", // #Should Match
    "http://example.com/cat1/subcat3/subcat4/tag/this%20is%20pageasdasd", // Should Match
    "example.it/news/tag/this%is%20n%page?adsadsadasd", // Should Match
    "http://example.com/tag/thispage/asdasdasd.-?asds=", // Should Match
    "http://example.com/tag/this%20is%20page/asdasd", // Should Match
    "http://example.com/tag/this", // Should Not Match
    "/tag/this/asdads" // Should Not Match
  ]

examples.forEach((example) => {
  if (example.match(regex)) {
    let matches = regex.exec(example),
        category = matches[1] !== undefined ? matches[1] : matches[3] !== undefined ? matches[3] : "No category",
        tag = matches[2] === undefined ? matches[4] : matches[2];
        
    console.log(`Full String: "${example}"\nCategory: "${category}"\nTag: "${tag}"`)
  }
})

See it on Regex101

KyleFairns
  • 2,947
  • 1
  • 15
  • 35
  • Not at all. If you saw my example: [link to example](https://regex101.com/r/G57QT0/2), i match all right except last string example that must not be matched cause /tag/ is not preceded or followed by other segments or characters. – Kouga Feb 05 '18 at 14:05
  • @Kouga See my edit, it should now do what you want it to – KyleFairns Feb 05 '18 at 17:31