0

I am using ruby 1.8.7. I am not using rails.

How do I find all the links which are not already in anchor tag.

s = %Q{ <a href='www.a.com'><b>www.a.com</b></a> www.b.com <div>www.c.com</div> }

The output of above string should be

www.b.com
www.c.com

I know "b" tag before www.a.com complicates the case but that's what I have to work with.

Nick Vanderbilt
  • 36,724
  • 29
  • 83
  • 106
  • 1
    Obligatory Cthulhu link: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Andrew Grimm May 16 '10 at 23:10

2 Answers2

0

You are going to want to use a real XML parser (Nokogiri will do). Regexes are unsuitable for a task like this. Especially so in ruby 1.8.7 where negative look behind is not supported.

Ben Hughes
  • 14,075
  • 1
  • 41
  • 34
0

Dirty way to get rid of anchor tags. Doesn't work the way you want if they're nested. Also use a real parser ;-)

s.gsub(%r[<a\b.*?</a>]i, "")
=> "  www.b.com <div>www.c.com</div> "
taw
  • 18,110
  • 15
  • 57
  • 76