0

this script is a part of a bigger one, when I run this script I get "<p></p>" print out as well. How can I remove this?

I used this regex: m.gsub!(/(?=\S)(\d|\W)/,"")

But it only removed the char "<" and "/>"

Here is my script:

require 'open-uri'
require 'rexml/document'
include REXML

doc = REXML::Document.new(open('http://testnavet.skolverket.se/SusaNavExport/EmilObjectExporter?id=184594606&amp;strId=info.uh.gu.GS5&amp;EMILVersion=1.1').read)

doc.elements.each("//*[name()='ct:text'] | /ns:educationInfo/ns:extensionInfo/gu:guInfoExtensions/gu:guSubject/gu:descriptions/gu:description"){
          |e| m = e.text 
              puts "Description: " + m  
        }

1 Answers1

1

Ah, so you want to remove HTML tags. If so, you can do this:

str.gsub(/<.+?>/, "")

Thus, "<div>Hello world!</div>" becomes "Hello world"

Jwosty
  • 3,497
  • 2
  • 22
  • 50
  • **No, it doesn’t.** It becomes `""`. – tchrist Mar 03 '12 at 20:43
  • 1
    That's because it should be: `/<.+?>/` where you do the non-greedy match: `+?`. Note, this is a base-case and escaped > characters would defeat this. Is that what the OP is looking for? – Mike Ryan Mar 03 '12 at 20:54
  • 2
    just for those who dont know http://rubular.com/ is a great place for playing around with ruby regexps – Hugo Mar 03 '12 at 21:10
  • it should be: str.gsub!(/<.+?>/, "")... you forgat the the "!" char –  Mar 03 '12 at 21:16
  • @SHUMAcupcake Note that `gsub` does work, it just returns the result, rather than modifying `str` like `gsub!` does. – Andrew Marshall Mar 03 '12 at 21:23
  • @AndrewMarshall Aha, cool. Do you know how I should handle the outputs that dosent have anything in them? –  Mar 03 '12 at 21:26
  • That just means there was nothing inside the tag – Jwosty Mar 06 '12 at 00:43