Regex - Matching in Rubular bu not in Ruby

Question

Given text like:

body = 

yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada 
< via mobile device > 

Yada Yada <xxxxx@xxxxx.com> wrote:

yada yada yada yada yada yada yada yada yada

I want to match the 2nd paragraph, so I'm doing:

body = body.split(/.* <xxxxx@xxxxx.com> wrote: .*/m).first

But that's not matching in ruby even though it is in Rubular. Any ideas why? thanks

score 1 · Answer 1 · answered Mar 05 '11 at 05:05

The line

Yada Yada <xxxxx@xxxxx.com> wrote:

does end with a linebreak, not with a space. So your regular expression should be:

/.* <xxxxx@xxxxx.com> wrote:\n.*/m

Attention: Windows systems and some protocols like HTML can use different linebreak encodings. If you want to be sure to be compatible, convert your input to unix linebreak encoding first and then do the data extraction. You could use my linebreak gem for this.

Alan Moore · Accepted Answer · 2011-03-05T06:15:40.963

1

Try this instead:

body = body.split(/.*<xxxxx@xxxxx.com> wrote:.*/).first

The space after the first .* was useless, and (as @aef pointed out) the space before the second .* was erroneous (maybe there was a space there in your rubular test).

Notice that I removed the m modifier, too. If I hadn't, the regex would have matched the whole string, resulting in a empty array. That's what Ruby calls multiline mode (and everyone else calls single-line or dot-all mode): the . matches anything including newlines.

EDIT: See it on ideone.com

edited Mar 05 '11 at 06:15

answered Mar 05 '11 at 05:18

Alan Moore

73,866
12
100
156

It works for me; see my edit. But yes, `scan` is an option, too: `body=body.scan(/.+/).first`. That cuts off the `< via mobile device >` line; if you want to keep it, you can change the regex to `/.+(?:\n.+)*/`. That matches everything up to the next empty line. There are many other ways to solve this problem, too. – Alan Moore Mar 05 '11 at 06:25

Regex - Matching in Rubular bu not in Ruby

2 Answers2