0

So I want to search a string, using the below regular expression:

border-.*\.5pt

to find all border-top, border-bottom, etc CSS properties in a file with a border thickness of .5pt. It generally works great, but it's too greedy.

For example all of the below comes back as a single match:

border-top:solid #1F497D .5pt;border-bottom:solid #1F497D .5pt

I want those two CSS properties to be two separate matches.

So I tried to modify my regular expression to:

border-.*?\.5pt

Using ? to make it non-greedy. However, after that modification, nothing matches.

Can anyone explain why I see this behavior? What am I missing?

(If it's worth knowing, I'm using Microsoft Expression Web's 'find with regular expressions' when doing this search.)

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Ayo I
  • 7,722
  • 5
  • 30
  • 40
  • 1
    [I can only tell that it works here](http://regex101.com/r/wV7lF6), so maybe it's your app's regex engine... – Wrikken Aug 14 '13 at 20:53
  • @Wrikken, you're right. I incorrectly assumed that there would be consistency in regex syntax across Microsoft products. It turns out that PowerShell (the reference I was using) has different operators than Expression Web. Lesson learned. Thanks for taking a look at it! – Ayo I Aug 15 '13 at 00:53

2 Answers2

8

There is no one "regular expression" language. While there are broad commonalities, details differ from implementation to implementation. Many regexes use - to be the non-greedy "0 or more", others use *?. Apparently Microsoft Expression Web uses @.

In short, regexes can differ, so you'll often need to RTM for the one you're using to find its range of capabilities and detailed syntax (i.e. support for alteration/backtracking/etc., grouping character, set shorthand, etc.)

Mud
  • 28,277
  • 11
  • 59
  • 92
  • Thank you. This fixed it. I didn't realize that even across Microsoft products regex syntax changes. I was using a PowerShell regex reference assuming that it would apply to Expression Web. Thank you for the response. Good to know for the future. – Ayo I Aug 15 '13 at 00:46
3

.*? is the badest, so to say "antipattern" for Regular Expressions. It is commonly used as a "Match-something-until-the-string-i-want" Pattern - but it isn't.

Especially when combining multiple .*? within ONE pattern, it may lead to very wrong and unexpected results.

For your Case - as stated in the comments - It works. (Maybe you did something wrong?)

However, it is ALWAYS a good idea to be more specific, when generating a regex pattern. ALWAYS KEEP IN MIND that .*? can be ANYTHING. Also Stuff you really don't want to match!

In your example, i would use something like this: border-(?:[^:]+):\s*(?:[^\s]+)\s+(?:\#[a-fA-F0-9]{6})\s+(?:\d*(?:\.\d+)?)pt;?

It is more specific, but matches the given Requirements, ignores all whitespaces that dont make sence, and even matches border widths, regardles if they are written as .2, 3 or 4.1. If you remove the ?: from the single match Groups you can also match every single attribute, if required. : Position, Border type, Color and thickness.

The pattern border-([^:]+):\s*([^\s]+)\s+(\#[a-fA-F0-9]{6})\s+(\d*(?:\.\d+)?)pt;? with your string border-top:solid #1F497D .5pt;border-bottom:solid #1F497D .5pt will match:

First Match:

1.top
2.solid
3.#1F497D
4..5

Second Match:

1.bottom
2.solid
3.#1F497D
4..5
dognose
  • 20,360
  • 9
  • 61
  • 107