0

I'm trying to find numbers before image extensions (jpg|jpeg|png|bmp|gif|tif). But I want to exclude them if there's an NUMBERSxNUMBERS pattern. My expression is:

(?!\dx\d)[0-9]{1,}.(jpg|jpeg|png|bmp|gif|tif)

●The text is found in long links like this one: http://www.google.com/imgres?newwindow=1&safe=off&sa=X&hl=es&rls=%7Bmoz%3AdistributionID%7D%3A%7Bmoz%3Alocale%7D%3A%7Bmoz%3Aofficial%7D&tbs=imgo%3A1&tbm=isch&tbnid=Sl6oOM1zv4WRkM%3A&imgrefurl=http%3A%2F%2Fes.gdefon.com%2Fdownload%2FMostrar-Pato-Duffy_El-show-del-Pato%2F30329%2F1280x1024&docid=JNlhLyS8MUlRAM&imgurl=http%3A%2F%2Fst.gdefon.com%2Fwallpapers_original%2Fwallpapers%2F30329_shou-daffi-daka_or_the-daffy-duck-show_1280x1024.jpg

TRY WITH THESE: aoi32x453.jpg ser32xa453.jpeg as/as673.jpg x673.png ygt/x673.bmp x673.gif

I need to exclude the matches that contains '\dx\d' before my expression.

I need to be a match: sax73.jpg

But I don't want those with 'NUMBERSxNUMBERS': 35x35.jpg

  • ☻ Javascript in Greasemonkey for Firefox Nightly
  • ☻ ser32xa453 <<< this 'xa' is to include, but exclude ser32x453
Sam-Bo
  • 779
  • 1
  • 7
  • 11
  • In which environment are you using this (language, editor, tool)? There are different levels of support for lookbehind in different regex implementations. – Bergi Jan 21 '14 at 20:04
  • Its not really an issue to exclude 'x' like in your example. The big problem is to know where to start looking for the x. For that, you have to use rules to parse the whole file name. –  Jan 21 '14 at 20:13
  • 1
    Is this `ser32xa453.jpeg` a typo, or should it match? –  Jan 21 '14 at 20:20
  • ☻ I'm using scripts in Greasemonkey for Firefox ☻ ser32xa453 <<< this 'xa' is to include, but exclude ser32x453 – Sam-Bo Jan 22 '14 at 05:39

3 Answers3

2

(?!...) is a negative look-ahead. A look-behind is (?<!...). Furthermore, (?<!x.*) would reject any string if x appeared anywhere in the string before your pattern. If you want to make sure that x doesn't appear immediately before the number, use (?<!x).

However, that will just match the first digit character that doesn't appear after an x. e.g. in "35x73.jpg", it will simply match 3.jpg. One easy solution is to ensure that the previous character is also not a digit by using a look-behind like (?<![x0-9]).

A few more notes: {1,} can be simplified to +, and [0-9] can be simplified to \d (although, depending on your environment, \d may match numeral from other number systems, e.g. Eastern Arabic numerals):

(?<![x\d])\d+\.(jpg|jpeg|png|bmp|gif|tif)
p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
0

If the language you use doesn't have the lookbehind feature (like Javascript) or since you need a variable length lookbehind (that is only supported in .net languages AFAIK and with some restrictions in Java), You can use a capturing group:

(?:^|\s)[^x\s]*?([0-9]+\.(?:jpg|jpeg|png|bmp|gif|tif))(?:\s|$)

You only need to extract the first capturing group

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
0

(?<=\s)[^x]*?[0-9]+\.(jpg|jpeg|png|bmp|gif|tif)(?=\s)

tenub
  • 3,386
  • 1
  • 16
  • 25