4

Using ruby (not rails), I'm trying to figure out how to replace (not append) a certain block in a static file with a string. For example, in static_file.html I want to replace everything between the html comments "start" and "end":

<p>lorem ipsum blah blah ipsum</p>

<!--start-->
REPLACE MULTI-LINE
CONTENT HERE...
<!--end-->

<p>other stuff still here...</p>

Some of the answers here are helpful for inserting text at a certain spot, but does not handle between.

Community
  • 1
  • 1
chronon
  • 568
  • 3
  • 12
  • If you are using a template file to generate HTML content, you might want to look into either ERB or [HAML](http://haml-lang.com/). Personally, I prefer HAML as it's a nice HTML shorthand. Either will be a better solution than doing search/replace if you need to inject differing content into a boilerplate. – the Tin Man Jan 30 '11 at 04:53

3 Answers3

4

Here's a function to handle it for you. Just pass it a file path and the contents to replace in between those HTML comment blocks:

As long as your comment blocks are always formatted the same: <--start--> and <!--end-->, this will work.

def replace(file_path, contents)
    file = File.open(file_path, "r+")
    html = ""

    while(!file.eof?)
        html += file.readline
    end

    file.close()

    return html.gsub(/<!--start-->(.*)<!--end-->/im, contents)
end
Jordan
  • 31,971
  • 6
  • 56
  • 67
2

the simple answer would be:

str = "FOO\n\BAR\nblah \nblah BAZ\nBLOOP"
str.gsub(/BAR.*BAZ/m,"SEE")

I'm not sure if that's robust enough for what you are trying to do. The key here is the 'm' at the end of the regexp to indicate multi-line. If this is to template some values you may want to look at something like ERB templates instead of this gsub. Also, be careful on what you need to escape in your regular expressions.

shawn42
  • 439
  • 4
  • 6
  • The "REPLACE CONTENT..." will be dynamically generated/changing, so I won't know what it is to replace using a regex. – chronon Jan 30 '11 at 04:38
  • You can dynamically create those regular expressions: r = Regexp.new "foo.*bar", Regexp::MULTILINE – shawn42 Jan 30 '11 at 04:48
1

This is a simplified example of how to do it using a parser:

require 'nokogiri'

html = '<p>lorem ipsum blah blah ipsum</p>

<!--start-->
REPLACE MULTI-LINE
CONTENT HERE...
<!--end-->

<p>other stuff still here...</p>'

doc = Nokogiri.HTML(html)
puts doc.to_html

After parsing we get:

# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <p>lorem ipsum blah blah ipsum</p>
# >> 
# >> <!--start-->
# >> REPLACE MULTI-LINE
# >> CONTENT HERE...
# >> <!--end-->
# >> 
# >> <p>other stuff still here...</p>
# >> </body></html>

doc.at('//comment()/following-sibling::text()').content = "\nhello world!\n"
puts doc.to_html

After finding the comment, stepping to the next text() node and replacing it:

# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <p>lorem ipsum blah blah ipsum</p>
# >> 
# >> <!--start-->
# >> hello world!
# >> <!--end-->
# >> 
# >> <p>other stuff still here...</p>
# >> </body></html>

If your HTML is always going to be simple, with no possibility of having strings that break your search patterns, then you can go with search/replace.

If you check around, you see that for any non-trivial HTML manipulation you should go with a parser. That's because they deal with the actual structure of the document, so if the document changes, there's a better chance of the parser not being confused.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • Your solution assumes that all comments will be replaced and doesn't target a particular structure like the question states. It also seems like a bit of overkill to bring in a parser just because the text happens to be HTML. We're not trying to reflow or reformat the entire document here. – Jordan Jan 30 '11 at 04:46
  • No, my solution assumes the *FIRST* comment will be, matching the sample. It's written to be a starting point, not a complete solution. – the Tin Man Jan 30 '11 at 04:48
  • Thanks for the example, though slightly more complicated then I was hoping for, it's a clear explanation on how to use the parser. – chronon Jan 30 '11 at 05:16
  • for my edification, isn't //comment() the xpath query to find all comments, regardless of their descendance hierarchy, and return them as a nodeset? – Jordan Jan 30 '11 at 08:09
  • No, it only means find a comment. It's up to the method that is passed the XPath to determine whether there is one or many nodes returned. In my code I used `at` which returns only the first matching node encountered. Actually, I told it to return the text() node that is the following sibling of the first encountered comment(). – the Tin Man Jan 30 '11 at 09:35