0

I found a nice example (https://gist.github.com/danneu/3977120) on how to parse XML with Ox and Sax. Now I'm trying to make it work with my own XML files that uses CDATA fields.

This is the example I found:

require "ox"
require "open-uri"

class Handler < Ox::Sax
  USER_ATTR = [:userid, :username, :email]
  ATTR_MAP = { userid: :as_i, username: :as_s, email: :as_s}

  def start_element(name)
    @user = {} if name == :row
    @current_node = name
  end

  def value(value)
    return unless USER_ATTR.include?(@current_node)
    @user[@current_node] = value.send(ATTR_MAP[@current_node])
  end

  def end_element(name)
    return unless name == :row
    puts @user[:userid], @user[:username], @user[:email]
  end
end

handler = Handler.new
File.open("user.xml") do |f|
  Ox.sax_parse(handler, f)
end

Which works with this type of XML file:

<!-- 80mb file, 38,000 row nodes -->
<?xml version="1.0"?>
<data>
  <users>
    <row>
      <userid>1</userid>
      <username>danneu</username>
      <email>danrodneu@gmail.com</email>
      <dozens>etc.</dozens>
      <more>etc.</more>
      <nodes>etc.</nodes>
    </row>
    <row>
      <userid>2</userid>
      ...
    </row>
    ...
  </users>
</data>

Now I want to make it work with this type of XML file:

<!-- 80mb file, 38,000 row nodes -->
<?xml version="1.0"?>
<data>
  <users>
    <row>
      <userid><![CDATA[1]]></userid>
      <username><![CDATA[danneu]]></username>
      <email><![CDATA[danrodneu@gmail.com]]></email>
      <dozens><![CDATA[etc.]]></dozens>
      <more><![CDATA[etc.]]></more>
      <nodes><![CDATA[etc.]]></nodes>
    </row>
    <row>
      <userid><![CDATA[2]]></userid>
      ...
    </row>
    ...
  </users>
</data>

Anyone know how I can make it work with the CDATA fields?

Thanks in advance.

Reinier
  • 152
  • 1
  • 10

1 Answers1

1

Use the cdata method instead of the value method. Like so:

def cdata(str)
    # do something with str here
end

See the doc for the different methods.

mcginniwa
  • 360
  • 3
  • 9