1

I am having trouble writing a perl compatible regex to match a few different things when there is a gap between each condition. It makes more sense when I explain what I want it to find

Conditions

  1. /world/
  2. a single letter
  3. a dash OR underscore
  4. a single letter
  5. a single period
  6. three or four letters

The problem I have is I don't know how to write the expression such that there can be a gap between condition #1 and #2. Conditions #2 - #4 can repeat, but not always.

I've been using multiple online regex testers but I cannot get a match and I am not sure what I am doing wrong. I think regex is looking for /world/x_x or /world/y-y instead of "looking ahead" to match on the "letter dash letter" or "letter underscore letter" pattern.

Current regex

/world/([a-z](-|_)[a-z]){1,}\.[a-z]{3,4}$

Desired Matches (not currently matching)

hxxp://armassimchilzeispreu.blackjackipad.com/world/activate_available.jar

hxxp://chubfaceddamsel0.affittobarcheavela.it/world/eternal_threat-clearing.html

hxxp://offdestroyengarabitar.freebookofraslot.com/world/bonus-middle-marathon.pdf
Borodin
  • 126,100
  • 9
  • 70
  • 144
user2249813
  • 25
  • 1
  • 4
  • 1
    Your desired matches don't seem to match the conditions you're using. You say you want strings starting with `/world/[a single letter][- or _][another single letter]`, but then you say you want it to match `/world/activate_available.jar`. "activate" and "available" are both much longer than one character. – user1618143 Apr 05 '13 at 17:02
  • Also, `eternal_threat-clearing.html`and `bonus-middle-marathon.pdf` have more than one "hyphen-or-underscore". – Borodin Apr 05 '13 at 17:10
  • @user1618143 - I say x-x or y_y because I just want to ensure that pattern is there, I don't care how long or what word is in the URL. I know the URLs have full words but I don't care about what word is there. Do I have to match on word-word or word_word in their entirety? – user2249813 Apr 05 '13 at 17:11
  • @Borodin - yes, that's why I am trying to write the regex as one or more {1,} is my unerstanding of how that works. – user2249813 Apr 05 '13 at 17:13
  • Ah I see. Take a look at my answer. Is that what you want? Also, `{1,}` is traditionally written as `+`. – Borodin Apr 05 '13 at 17:15

1 Answers1

3

I think you want this

use strict;
use warnings;

while (<DATA>) {
  chomp;
  print "OK $_\n" if m</world/[a-z]+(?:[_-][a-z]+)+\.[a-z]{3,4}$>;
}

__DATA__
hxxp://armassimchilzeispreu.blackjackipad.com/world/activate_available.jar
hxxp://chubfaceddamsel0.affittobarcheavela.it/world/eternal_threat-clearing.html
hxxp://offdestroyengarabitar.freebookofraslot.com/world/bonus-middle-marathon.pdf

or perhaps just

m</world/[a-z-_]+\.[a-z]{3,4}$>
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • I think this is it! If you don't mind, would you explain what you did? I am not that familiar with the ?: (non-capturing subpattern according to documentation). I would like to study this so I can better understand it. You have no idea how elated I am at having this signature solved, but I do want to ensure I understand it :D – user2249813 Apr 05 '13 at 17:21
  • Just noticed your second expression. That one works as well. I'm trying to dissect it. – user2249813 Apr 05 '13 at 17:31
  • The [a-z-_]+ - match on any letters,dash, or underscore one or more times. One thing, I may be wrong but this will match on a string even if it doesn't have a - or _ . Is there any way to make it having a -|_ mandatory to be in the string? – user2249813 Apr 05 '13 at 17:36