I found a nice example (https://gist.github.com/danneu/3977120) on how to parse XML with Ox and Sax. Now I'm trying to make it work with my own XML files that uses CDATA fields.
This is the example I found:
require "ox"
require "open-uri"
class Handler < Ox::Sax
USER_ATTR = [:userid, :username, :email]
ATTR_MAP = { userid: :as_i, username: :as_s, email: :as_s}
def start_element(name)
@user = {} if name == :row
@current_node = name
end
def value(value)
return unless USER_ATTR.include?(@current_node)
@user[@current_node] = value.send(ATTR_MAP[@current_node])
end
def end_element(name)
return unless name == :row
puts @user[:userid], @user[:username], @user[:email]
end
end
handler = Handler.new
File.open("user.xml") do |f|
Ox.sax_parse(handler, f)
end
Which works with this type of XML file:
<!-- 80mb file, 38,000 row nodes -->
<?xml version="1.0"?>
<data>
<users>
<row>
<userid>1</userid>
<username>danneu</username>
<email>danrodneu@gmail.com</email>
<dozens>etc.</dozens>
<more>etc.</more>
<nodes>etc.</nodes>
</row>
<row>
<userid>2</userid>
...
</row>
...
</users>
</data>
Now I want to make it work with this type of XML file:
<!-- 80mb file, 38,000 row nodes -->
<?xml version="1.0"?>
<data>
<users>
<row>
<userid><![CDATA[1]]></userid>
<username><![CDATA[danneu]]></username>
<email><![CDATA[danrodneu@gmail.com]]></email>
<dozens><![CDATA[etc.]]></dozens>
<more><![CDATA[etc.]]></more>
<nodes><![CDATA[etc.]]></nodes>
</row>
<row>
<userid><![CDATA[2]]></userid>
...
</row>
...
</users>
</data>
Anyone know how I can make it work with the CDATA fields?
Thanks in advance.