0

Given text like:

body = 

yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada 
< via mobile device > 

Yada Yada <xxxxx@xxxxx.com> wrote:

yada yada yada yada yada yada yada yada yada 

I want to match the 2nd paragraph, so I'm doing:

body = body.split(/.* <xxxxx@xxxxx.com> wrote: .*/m).first

But that's not matching in ruby even though it is in Rubular. Any ideas why? thanks

AnApprentice
  • 108,152
  • 195
  • 629
  • 1,012

2 Answers2

1

The line

Yada Yada <xxxxx@xxxxx.com> wrote:

does end with a linebreak, not with a space. So your regular expression should be:

/.* <xxxxx@xxxxx.com> wrote:\n.*/m

Attention: Windows systems and some protocols like HTML can use different linebreak encodings. If you want to be sure to be compatible, convert your input to unix linebreak encoding first and then do the data extraction. You could use my linebreak gem for this.

aef
  • 4,498
  • 7
  • 26
  • 44
1

Try this instead:

body = body.split(/.*<xxxxx@xxxxx.com> wrote:.*/).first

The space after the first .* was useless, and (as @aef pointed out) the space before the second .* was erroneous (maybe there was a space there in your rubular test).

Notice that I removed the m modifier, too. If I hadn't, the regex would have matched the whole string, resulting in a empty array. That's what Ruby calls multiline mode (and everyone else calls single-line or dot-all mode): the . matches anything including newlines.

EDIT: See it on ideone.com

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • It works for me; see my edit. But yes, `scan` is an option, too: `body=body.scan(/.+/).first`. That cuts off the `< via mobile device >` line; if you want to keep it, you can change the regex to `/.+(?:\n.+)*/`. That matches everything up to the next empty line. There are many other ways to solve this problem, too. – Alan Moore Mar 05 '11 at 06:25