3

I'm trying to parse XML in Ruby using Nori, which internally uses Nokogiri. The XML has some tags repeated and the library parses repeated tags as Arrays and non-repeated tags as normal elements (Hash)

<nodes>
  <foo>
    <name>a</name>
  </foo>
  <bar>
    <name>b</name>
  </bar>
  <baz>
    <name>c</name>
  </baz>
  <foo>
    <name>d</name>
  </foo>
  <bar>
    <name>e</name>
  </bar>
</nodes>

is parsed as

{nodes: {
  foo: [{name: "a"}, {name: "d"}],
  bar: [{name: "b"}, {name: "e"}],
  baz: {name: "c"}
}}

How do i retain the order of elements in the resulting hash like the output below?

{nodes: [
      {foo: {name: "a"}}, 
      {bar: {name: "b"}},
      {baz: {name: "c"}},
      {foo: {name: "d"}},
      {bar: {name: "e"}},
    ]}

(This may be a library specific question. But the intention is to know if anyone has faced a similar issue and how to parse it correctly)

Sathish
  • 20,660
  • 24
  • 63
  • 71

1 Answers1

1

Nori can't do this on its own. What you can do is tune the Nori output like this:

input = {nodes: {
  foo: [{name: "a"}, {name: "d"}],
  bar: [{name: "b"}, {name: "e"}],
  baz: {name: "c"}
}}

def unfurl(hash)
  out=[]
  hash.each_pair{|k,v|
    case v
    when Array
      v.each{|item|
        out << {k => item}
      }
    else
      out << {k => v}
    end
  }
  return out
end

output = {:nodes => unfurl(input[:nodes])}

puts output.inspect

This prints the output that the original question requested which is different than the XML order:

{nodes: [
  {foo: {name: "a"}}, 
  {foo: {name: "d"}},
  {bar: {name: "b"}},
  {bar: {name: "e"}},
  {baz: {name: "c"}},
]}
joelparkerhenderson
  • 34,808
  • 19
  • 98
  • 119
  • I would still be losing the original order in the XMLs this way. To retain order, i need to unfurl using both the xml and the hash – Sathish Mar 30 '12 at 17:53
  • Can you say more about the order you want? The script I posted generates the order you requested (which is different than the order of the elements in the XML doc). – joelparkerhenderson Mar 30 '12 at 19:47
  • I made a mistake in the question. The order i expected is [foo1, bar1, baz1, foo2, bar2]. And if there's a way to retain order without referring back to original xml – Sathish Mar 31 '12 at 04:09