I have a project which takes large amounts of XML data and passes that to Nokogiri, eventually adding each element to a hash outputting to a YAML file.
This is works until the XML data set contains duplicate keys.
Example Data:
<document>
<form xmlns="">
<title>
<main-title>Foo</main-title>
</title>
<homes>
<home>
<home-name>home 1</home-name>
<home-price>10</home-price>
</home>
<home>
<home-name>home 2</home-name>
<home-price>20</home-price>
</home>
</homes>
</form>
</document>
Within the homes
element I can have multiple homes, however each home
will always contain different content.
This data should eventually output a structure like this:
title:
main-title: Foo
homes:
home:
home-name: home 1
home-price: 10
home:
home-name: home 2
home-price: 20
However all I ever get is the last element inside homes
title:
main-title: Foo
homes:
home:
home-name: home 2
home-price: 20
I believe this to be because, when adding each element to the hash, it will simply overwrite the key if it already exists, thus always giving me the last key.
This is the code used to append elements to the hash:
def map_content(nodes, content_hash)
nodes.map do |element|
case element
when Nokogiri::XML::Element
child_content = map_content(element.children, {})
content_hash[element.name] = child_content unless child_content.empty?
when Nokogiri::XML::Text
return element.content
end
end
content_hash
end
I believe
content_hash[element.name] = child_content
is the culprit, however this code creates similar YAML files that have these types of duplicate keys, and I'd like to preserve that functionality, so I don't want to simply add a unique key to the data hash as this would mean I'd need to modify many methods and update how they pull data from the YAML file.
I read about compare_by_identity
but not sure if how I would implement this.
I tried using compare_by_identity
but it just results in an empty YAML file, so maybe it's generating the hash but it can't be written to the YAML file?
def map_content(nodes, content_hash)
content_hash = content_hash.compare_by_identity
nodes.map do |element|
case element
when Nokogiri::XML::Element
child_content = map_content(element.children, {})
content_hash[element.name] = child_content unless child_content.empty?
when Nokogiri::XML::Text
return element.content
end
end
content_hash
end
end