4

I'm grabbing data from an api that is returning xml like this:

<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>

I'm new to deserialization but what I think is appropriate is to parse this xml into a ruby object that I can then reference like objectFoo.seriess.series.frequency that would return 'Quarterly'.

From my searches here and on google there doesn't seem to be an obvious solution to this in Ruby (NOT rails) which makes me think I'm missing something rather obvious. Any ideas?

Edit I setup a test case based upon Winfield's suggestion.

class Exopenstruct

  require 'ostruct'

  def initialize()  

  hash = {"seriess"=>{"realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "series"=>{"id"=>"GDPC1", "realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "title"=>"Real Gross Domestic Product, 1 Decimal", "observation_start"=>"1947-01-01", "observation_end"=>"2012-10-01", "frequency"=>"Quarterly", "frequency_short"=>"Q", "units"=>"Billions of Chained 2005 Dollars", "units_short"=>"Bil. of Chn. 2005 $", "seasonal_adjustment"=>"Seasonally Adjusted Annual Rate", "seasonal_adjustment_short"=>"SAAR", "last_updated"=>"2013-01-30 07:46:54-06", "popularity"=>"93", "notes"=>"Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States.\n\nFor more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"}}}

  object_instance = OpenStruct.new( hash )

  end
end

In irb I loaded the rb file and instantiated the class. However, when I tried to access an attribute (e.g. instance.seriess) I received: NoMethodError: undefined method `seriess'

Again apologies if I'm missing something obvious.

JohnGalt
  • 2,851
  • 2
  • 21
  • 29
  • 1
    what do you have so far? –  Jan 30 '13 at 17:58
  • Hi Daniel, within Rails I've parsed the XML to a Hash and also to json. But in the end I just have a text representation of the data. I can manipulate the hash via someObj['Seriess']['Series]['frequency'] but I want to be able to have an in memory object that I can address like objectFoo.seriess.series.frequency. – JohnGalt Jan 30 '13 at 18:56

4 Answers4

17

You may be better off using standard XML to Hash parsing, such as included with Rails:

object_hash = Hash.from_xml(xml_string)
puts object_hash['seriess']

If you aren't using a Rails stack, you can use a library like Nokogiri for the same behavior.

EDIT: If you're looking for object behavior, using OpenStruct is a great way to wrap the hash for this:

object_instance = OpenStruct.new( Hash.from_xml(xml_string) )
puts object_instance.seriess

NOTE: For deeply nested data, you may need to recursively convert embedded hashes into OpenStruct instances as well. IE: if attribute above is a hash of values, it will be a hash and not an OpenStruct.

Sergey Kishenin
  • 5,099
  • 3
  • 30
  • 50
Winfield
  • 18,985
  • 3
  • 52
  • 65
  • I actually have done that in Rails. I feel as though I'm missing "some big picture context here so apologies in advance". I can manipulate the hash via someObj['Seriess']['Series]['frequency'] but I want to be able to have an in memory object that I can address like objectFoo.seriess.series.frequency. – JohnGalt Jan 30 '13 at 18:58
  • Let me edit this answer to show how to use OpenStruct to achieve this. – Winfield Jan 31 '13 at 16:42
  • Added a test case based upon your suggestions above, but ran into an issue explained above. – JohnGalt Feb 01 '13 at 18:41
  • Hmmm. I verified the examples above in a Rails 3.2.11 console under Ruby 1.9.3 – Winfield Feb 01 '13 at 20:44
4

I've just started using Damien Le Berrigaud's fork of HappyMapper and I'm really pleased with it. You define simple Ruby classes and include HappyMapper. When you call parse, it uses Nokogiri to slurp in the XML and you get back a complete tree of bona-fide Ruby objects.

I've used it to parse multi-megabyte XML files and found it to be fast and dependable. Check out the README.

One hint: since XML file encoding strings sometimes lie, you may need to sanitize your XML like this:

def sanitize(xml)
  xml.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
end

before passing it to the #parse method in order to avoid Nokogiri's Input is not proper UTF-8, indicate encoding ! error.

update

I went ahead and cast the OP's example into HappyMapper:

XML_STRING = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'

class Series; end;              # fwd reference

class Seriess
  include HappyMapper
  tag 'seriess'

  attribute :realtime_start, Date
  attribute :realtime_end, Date
  has_many :seriess, Series, :tag => 'series'
end
class Series
  include HappyMapper
  tag 'series'

  attribute 'id', String
  attribute 'realtime_start', Date
  attribute 'realtime_end', Date
  attribute 'title', String
  attribute 'observation_start', Date
  attribute 'observation_end', Date
  attribute 'frequency', String
  attribute 'frequency_short', String
  attribute 'units', String
  attribute 'units_short', String
  attribute 'seasonal_adjustment', String
  attribute 'seasonal_adjustment_short', String
  attribute 'last_updated', DateTime
  attribute 'popularity', Integer
  attribute 'notes', String
end

def test
  Seriess.parse(XML_STRING, :single => true)
end

and here's what you can do with it:

>> a = test
>> a.class
Seriess
>> a.seriess.first.frequency
=> "Quarterly"
>> a.seriess.first.observation_start
=> #<Date: 1947-01-01 ((2432187j,0s,0n),+0s,2299161j)>
>> a.seriess.first.popularity
=> 93
fearless_fool
  • 33,645
  • 23
  • 135
  • 217
1

Nokogiri solves the parsing. How to handle the data, is up to you, here I use OpenStruct as an example:

require 'nokogiri'
require 'ostruct'
require 'open-uri'

doc = Nokogiri.parse open('http://www.w3schools.com/xml/note.xml')

note = OpenStruct.new

note.to = doc.at('to').text
note.from = doc.at('from').text
note.heading = doc.at('heading').text
note.body = doc.at('body').text

=> #<OpenStruct to="Tove", from="Jani", heading="Reminder", body="ToveJaniReminderDon't forget me this weekend!\r\n">

This is just a teaser, your problem magnitude may be many times bigger. Just giving you an edge to begin to work with


Edit: stumbling across google and stackoverflow I ran into a possible hybrid between my answer and @Winfield's using rails Hash#from_xml:

> require 'active_support/core_ext/hash/conversions'
> xml = Nokogiri::XML.parse(open('http://www.w3schools.com/xml/note.xml'))
> Hash.from_xml(xml.to_s)
=> {"note"=>{"to"=>"Tove", "from"=>"Jani", "heading"=>"Reminder", "body"=>"Don't forget me this weekend!"}}

Then you can use this hash to, for example, initialize a new ActiveRecord::Base model instance or whatever else you decide to do with it.

http://nokogiri.org/
http://ruby-doc.org/stdlib-1.9.3/libdoc/ostruct/rdoc/OpenStruct.html https://stackoverflow.com/a/7488299/1740079

Community
  • 1
  • 1
ichigolas
  • 7,595
  • 27
  • 50
  • Your second example looks roundabout. You parse the xml and then dump it to a string so you can parse it again in a different way? – David Grayson Jan 31 '13 at 16:52
0

If you wanted to convert the xml to a Hash, I've found the nori gem to be the simplest.

Example:

require 'nori'

xml = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'

hash = Nori.new.parse(xml)    
hash['seriess']
hash['seriess']['series']
puts hash['seriess']['series']['@frequency']

Note '@' used for frequency as it's an attribute of 'series' not an element.

Samuel Garratt
  • 301
  • 3
  • 5