Regex replace is eating up the whole string! How do I make regex ungreedy?

Question

I'm working with a really large spreedsheet in Open Office and I've had to learn regular expressions to clean it up.

Right now I'm trying to remove all <span> tags and I've come up with an expression to do so:

(<span.*?>|</span>)

The problem is that OpenOffice doesn't seem to like the question mark (which should make it ungreedy), so when I try to remove the <span> tags, it removes most of my string.

Here is a sample of the data: http://pastebin.com/AKWZJJCv

What is an alternative way of reming the <span> tags that would work in OpenOffice's find and replace?

If you observe that `.*?` remains greedy, it would point to the fact that the regular expression is not read as a perl-compatible regex (PCRE), but as, for example, Basic/Extended/POSIX regex (none of which know the `?` modifier to non-greedify `.*`) — jørgensen, Jan 20 '12 at 19:10
However, OpenOffice is Java based. I would be surprised if it did not use the Java regex engine. I wonder what is going on there. — 700 Software, Jan 20 '12 at 19:22

antiduh · Accepted Answer · 2012-01-20T19:16:15.933

2

You could also try (<span[^>]*>|</span>)

edited Jan 20 '12 at 19:16

answered Jan 20 '12 at 19:11

antiduh

11,853
4
43
66

That did the trick. Thank you! If you don't mind me asking, what does `[^>]*` mean? I know that the `[^>]` will match the first `>`, but if the `*` means 0 or more, then why is it needed? – Jan 20 '12 at 19:36
1

`[]` is the character class. `[abcd]` says "match exactly one character from the input that is either a, b, c, or d". `[^]` is the negative character class, which says "match any one character thats not in the class". I told it "match any number of characters thats not a '>', then match a '>'." – antiduh Jan 20 '12 at 19:44

score 1 · Answer 2 · answered Jan 20 '12 at 19:09

1

Give this a try:

<(\/)?span([a-zA-z\-\="0-9 ]*)?>

Tested here.

answered Jan 20 '12 at 19:09

Rick Kuipers

6,616
2
17
37

Regex replace is eating up the whole string! How do I make regex ungreedy?

2 Answers2