How can I gsub everything until the next empty line?

Question

Given this string:

bc.  some text
 more text
 even more

^ above here is the empty line

I want it to be:

<pre>
some text
more text
even more
</pre>

^ above here is the empty line

How can I regex for "starting from bc. until the first empty line"?

So far I got this:

# note that for some reason a direct .gsub! behaves
# differently/fails when using the block, so I use .gsub
textile_markup = textile_markup.gsub(/^bc.  .*^$/m) { |s| "<pre>#{s[5..(s.length)]}</pre>" }

Understandibly, this matches greedy until the very last empty line - instead of the first one. How can I make the ^$ part non-greedy?

Usually `.*?` is the non-greedy version of `.*`. Would that work? — tadman, Mar 19 '13 at 19:04
Do you have only one block, or is this a repeating pattern through the string/file? If it's repeating you need to represent that in your sample data. Also, why does this have to be done using a regular expression? — the Tin Man, Mar 19 '13 at 19:46
do you know this great site called rubular? http://rubular.com/r/uloTda090y — phoet, Mar 19 '13 at 19:47
@theTinMan I only have one block. I am open to more efficient solutions also. However I think the shortest path will be a regex. — user569825, Mar 19 '13 at 20:06
@phoet Thanks for the hint on the site. The example matcher however fails my sample code. See here: http://rubular.com/r/Tz5MuKg41z (sorry also for the confusion - I updated the sample to display the last string **after** the closing ``. — user569825, Mar 19 '13 at 20:10
@user569825 there is no need to gsub the whole thing! use a matchgroup and then put everything where it belongs — phoet, Mar 19 '13 at 20:34
@phoet I am having trouble understanding your proposed idea. Could you update the example you posted to reflect it? — user569825, Mar 19 '13 at 21:38
@user569825 have a look at the docs http://apidock.com/ruby/String/match — phoet, Mar 20 '13 at 07:12

score 2 · Accepted Answer · answered Mar 20 '13 at 03:28

str = 
"bc.  some text
more text
even more

^ above here is the empty line

bc.  some text
more text
even more

^ above here is the empty line"

puts str.gsub(/^bc\.  (.*?)\n\n/m, "<pre>\n\\1\n</pre>\n\n")

Output:

<pre>
some text
more text
even more
</pre>

^ above here is the empty line

<pre>
some text
more text
even more
</pre>

^ above here is the empty line

Explanation

? in .*? makes the star operator non greedy

/m modifier in the end makes dot match newlines

On the documents I tested so far work great with that code. Thank you! — user569825, Mar 21 '13 at 00:07

steenslag · Answer 2 · 2013-03-19T22:14:10.003

1

It can be done in one go, but it needs some preparation:

txt = <<DOC
bc.  some text
 more text
 even more

bc.  some text
 more text
 even more

DOC

TRANSFORMS = {"bc.  " => "<pre>\n",       # The 'bc.  should become <pre> followed by a line-end
              /^ /    => "",              # leading space should be eliminated
             "\n\n"   => "\n<\/pre>\n\n"} # empty line should be preceded by a closing pre-tag

re = Regexp.union(TRANSFORMS.keys)
puts txt.gsub(re, TRANSFORMS)

Output:

<pre>
some text
more text
even more
</pre>

<pre>
some text
more text
even more
</pre>

edited Mar 19 '13 at 22:14

answered Mar 19 '13 at 22:09

steenslag

79,051
16
138
171

I absolutely like the way you coded that! The substitutions apply in other situations as well, which I want to avoid as I am working on huge documents. Yuriy Golobokov's `gsub` variant works and I think it'd be interesting for everyone to see it using your style, if possible. Would be great if you'd update. – user569825 Mar 21 '13 at 00:06
It will add `"\n<\/pre>\n\n"` for every empty line even if paragraph wasn't started with `bc. ` – Yuri Golobokov Mar 21 '13 at 01:53

How can I gsub everything until the next empty line?

2 Answers2

Explanation