0

I've carefully cut and pasted from this Rubular window http://rubular.com/r/YH8Qj2EY9j to my code, yet I get different results. The Rubular match capture is what I want. Yet

desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
    puts description = $1 
end

only gets me the first line, i.e.

<DD>@mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />

I don't think it's my test data, but that's possible. What am I missing?

(ruby 1.9 on Ubuntu 10.10(

tom
  • 541
  • 1
  • 5
  • 16

3 Answers3

1

I believe you need the multiline modifier in your code:

/m Multiline mode: dot matches newlines, ^ and $ both match line starts and endings.

ennuikiller
  • 46,381
  • 14
  • 112
  • 137
  • I find that adding the m changes nothing in Rubular, and nothing in my code. If it worked for @Ken Bloom perhaps the problem is elsewhere... – tom Jul 18 '11 at 23:33
  • 1
    Multiline mode isn't the solution; if anything, it will make the problem worse. It changes the meaning of the regex from matching up to the first or second newline, to matching up to the *last* newline. Also, Ruby's multiline mode has no effect on the anchors; `^` and `$` *always* match line boundaries. – Alan Moore Jul 19 '11 at 08:01
1

Paste your test data into an editor that is able to display control characters and verify your line break characters. Normally it should be only \n on a Linux system as in your regex. (I had unusual linebreaks a few weeks ago and don't know why.)

The other check you can do is, change your brackets and print your capturing groups. so that you can see which part of your regex matches what.

/^<DD>(.*)\n?(.*)\n/

Another idea to get this to work is, change the .*. Don't say match any character, say match anything, but \n.

^<DD>([^\n]*\n?[^\n]*)\n
stema
  • 90,351
  • 20
  • 107
  • 135
1

The following:

#!/usr/bin/env ruby

desc= '<DD>@mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
<DT>la la this should not be matched oh good'
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
    puts description = $1 
end

prints

@mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377

on my system (Linux, Ruby 1.8.7).

Perhaps your line breaks are really \r\n (Windows style)? What if you try:

desc_pattern = /^<DD>(.*\r?\n?.*)\r?\n/
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288