0

This document is the output from a firewall configuration. I am trying to build a hash of firewall rules. I will later output this data to CSV/console/whatever I need:

<table index="44" title=" from PUBLIC to DMZ administrative service rules on Firewall01" ref="FILTER.BLACKLIST">
  <headings>
    <heading>Rule</heading>
    <heading>Action</heading>
    <heading>Source</heading>
    <heading>Destination</heading>
    <heading>Service</heading>
    <heading>Log</heading>
  </headings>
  <tablebody>
    <tablerow>
      <tablecell><item>test_inbound</item></tablecell>
      <tablecell><item>Allow</item></tablecell>
      <tablecell><item gotoref="CONFIG.3.452">[Group] test_b2_group</item></tablecell>
      <tablecell><item>[Host] Any</item></tablecell>
      <tablecell><item>[Host] Any</item></tablecell>
      <tablecell><item>Yes</item></tablecell>
    </tablerow>
    <tablerow>
      <tablecell><item>host02_inbound</item></tablecell>
      <tablecell><item>Allow</item></tablecell>
      <tablecell><item gotoref="CONFIG.3.447">[Group] host02_group</item></tablecell>
      <tablecell><item>[Host] Any</item></tablecell>
      <tablecell><item>[Host] Any</item></tablecell>
      <tablecell><item>Yes</item></tablecell>
    </tablerow>
    <tablerow>
      <tablecell><item>randomhost</item></tablecell>
      <tablecell><item>Allow</item></tablecell>
      **<tablecell><item gotoref="CONFIG.3.383">[Group] Host_group_2</item><item gotoref="CONFIG.3.382">[Group] another_server</item></tablecell>**
      <tablecell><item gotoref="CONFIG.3.510">[Group] crazy_application</item><item gotoref="CONFIG.3.511">[Group] internal_app</item><item gotoref="CONFIG.3.525">[Group] online_application</item></tablecell>
      <tablecell><item gotoref="CONFIG.3.783">[Group] junos-https</item></tablecell>
      <tablecell><item>No</item></tablecell>
    </tablerow>
  </tablebody>
</table>

We have the headers of the columns and three firewall rules.

Here is my code:

#!/usr/bin/env ruby

require 'nokogiri'
require 'csv'

fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }
rule_array = []

fwpol.xpath('./table/tablebody/tablerow').each do |item|
  rules = {}

   rules[:name]   = item.xpath('./tablecell/item')[0].text
   rules[:action] = item.xpath('./tablecell/item')[1].text
   rules[:source] = item.xpath('./tablecell/item')[2].text
   rule_array << rules
end

puts rule_array

The first two hash entries, :name and :action work perfectly, because there is only one value in those fields.

If I run the code it does not print where there are multiple values. The bolded XML line shows what I am referring to. I need to iterate over the values somehow, but so far my attempts have been fruitless.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
hatlord
  • 151
  • 2
  • 9
  • When asking about a problem with your code, you need to show us the minimum code and _minimum_ input data necessary to demonstrate the problem, and what you expect from it. "[mcve]". – the Tin Man Oct 14 '16 at 00:32

2 Answers2

2

You can get multiple element texts as Array in the following way.

require 'nokogiri'
require 'csv'

fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }
rule_array = []

fwpol.xpath('./table/tablebody/tablerow').each do |item|
  rules = {}

  rules[:name]   = item.xpath('./tablecell[1]/item').text
  rules[:action] = item.xpath('./tablecell[2]/item').text
  rules[:source] = item.xpath('./tablecell[3]/item').map(&:text)
  rule_array << rules
end

puts rule_array

output is here.

{:name=>"test_inbound", :action=>"Allow", :source=>["[Group] test_b2_group"]}
{:name=>"host02_inbound", :action=>"Allow", :source=>["[Group] host02_group"]}
{:name=>"randomhost", :action=>"Allow", :source=>["[Group] Host_group_2", "[Group] another_server"]}
ta1k0tme
  • 66
  • 3
1

I'd do something like this:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<table index="44" title=" from PUBLIC to DMZ administrative service rules on Firewall01" ref="FILTER.BLACKLIST">
  <tablebody>
    <tablerow>
      <tablecell><item>test_inbound</item></tablecell>
      <tablecell><item>Allow</item></tablecell>
      <tablecell><item gotoref="CONFIG.3.452">[Group] test_b2_group</item></tablecell>
      <tablecell><item>[Host] Any</item></tablecell>
      <tablecell><item>[Host] Any</item></tablecell>
      <tablecell><item>Yes</item></tablecell>
    </tablerow>
    <tablerow>
      <tablecell><item>randomhost</item></tablecell>
      <tablecell><item>Allow</item></tablecell>
      <tablecell><item gotoref="CONFIG.3.383">[Group] Host_group_2</item><item gotoref="CONFIG.3.382">[Group] another_server</item></tablecell>
      <tablecell><item gotoref="CONFIG.3.510">[Group] crazy_application</item><item gotoref="CONFIG.3.511">[Group] internal_app</item><item gotoref="CONFIG.3.525">[Group] online_application</item></tablecell>
      <tablecell><item gotoref="CONFIG.3.783">[Group] junos-https</item></tablecell>
      <tablecell><item>No</item></tablecell>
    </tablerow>
  </tablebody>
</table>
EOT

rule_array = doc.search('tablerow').map{ |row|
  name, action, source = row.search('tablecell')[0, 3].map{ |tc| tc.search('item').map(&:text) }

  {
    name: name,
    action: action,
    source: source
  }
}

Which, when run would return rule_array containing an array of hashes, where the last has the two item entries:

require 'ap'
ap rule_array

# >> [
# >>   [0] {
# >>     :name   => [
# >>       [0] "test_inbound"
# >>     ],
# >>     :action => [
# >>       [0] "Allow"
# >>     ],
# >>     :source => [
# >>       [0] "[Group] test_b2_group"
# >>     ]
# >>   },
# >>   [1] {
# >>     :name   => [
# >>       [0] "randomhost"
# >>     ],
# >>     :action => [
# >>       [0] "Allow"
# >>     ],
# >>     :source => [
# >>       [0] "[Group] Host_group_2",
# >>       [1] "[Group] another_server"
# >>     ]
# >>   }
# >> ]

Note: Don't do this:

fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }

It's simpler to use:

fwpol = Nokogiri::XML(File.read(ARGV[0]))

Instead of doing:

item.xpath('./tablecell/item')[0].text
item.xpath('./tablecell/item')[1].text
item.xpath('./tablecell/item')[2].text

simply find the tablecell tags once, then slice the ones you want: [0, 3], then iterate over that small group. It's faster and reduces the repetition of code.

See "How to avoid joining all text from Nodes when scraping" also.

Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303