0

I have a CSV that I like to save all my hash values on it. I am using nokogiri sax to parse a xml document and then save it to a CSV.

It parse and saves the first xml file but when start parsing the second one, it stop and the error I get is this:

The error: NoMethodError: undefined method<<' for nil:NilClass`

the nil error is happing in the @infodata[:titles] << @content

The sax parser:

require 'rubygems'
require 'nokogiri'
require 'csv'

class MyDocument < Nokogiri::XML::SAX::Document

  HEADERS = [ :titles, :identifier, :typeOfLevel, :typeOfResponsibleBody, 
              :type, :exact, :degree, :academic, :code, :text ]

  def initialize
     @infodata = {}
     @infodata[:titles] = Array.new([])
  end

  def start_element(name, attrs)
    @attrs = attrs
    @content = ''
  end
  def end_element(name)
    if name == 'title'
      Hash[@attrs]["xml:lang"]
      @infodata[:titles] << @content
      @content = nil
    end
    if name == 'identifier'
       @infodata[:identifier] = @content
       @content = nil
    end
    if name == 'typeOfLevel'
       @infodata[:typeOfLevel] = @content
       @content = nil
    end
    if name == 'typeOfResponsibleBody'
       @infodata[:typeOfResponsibleBody] = @content
       @content = nil
    end
    if name == 'type'
       @infodata[:type] = @content
       @content = nil
    end
    if name == 'exact'     
       @infodata[:exact] = @content
       @content = nil
    end
    if name == 'degree'
       @infodata[:degree] = @content
       @content = nil
    end
    if name == 'academic'
       @infodata[:academic] = @content
       @content = nil
    end
    if name == 'code'
       Hash[@attrs]['source="vhs"']
       @infodata[:code] = @content 
       @content = nil
    end
    if name == 'ct:text'
       @infodata[:beskrivning] = @content
       @content = nil
    end 
  end
  def characters(string)
    @content << string if @content
  end
  def cdata_block(string)
    characters(string)
  end
  def end_document
    File.open("infodata.csv", "ab") do |f|
      csv = CSV.generate_line(HEADERS.map {|h| @infodata[h] })
      csv << "\n"
      f.write(csv)
    end
  end
end

creating new an object for every file that is store in a folder(47.000xml files):

parser = Nokogiri::XML::SAX::Parser.new(MyDocument.new)
counter = 0

Dir.glob('/Users/macbookpro/Desktop/sax/info_xml/*.xml') do |item|
  parser.parse(File.open(item, 'rb'))
  counter += 1
  puts "Writing file nr: #{counter}"
end

3 xml files for trying the code: https://gist.github.com/2378898 https://gist.github.com/2378901 https://gist.github.com/2378904

James A Mohler
  • 11,060
  • 15
  • 46
  • 72
  • Tip: Instead of `csv = …; csv << "\n"; f.write(csv)` just do `csv = …; f.puts csv` – Phrogz Apr 15 '12 at 00:43
  • Tip: `foo = Array.new([])` is ridiculous; just do `foo = []`. – Phrogz Apr 15 '12 at 00:44
  • Tip: Don't use binary mode for xml or csv – pguardiario Apr 15 '12 at 05:42
  • nice tips, but do you guys have any ide why I get this error –  Apr 15 '12 at 10:23
  • @SHUMAcupcake The error told you what was happening: you were trying to call `<<` on `nil`. Based on your comment about the line number, this means that `@infodata[:titles]` is null when you try to append to it. Investigate that. Pare down your code to the essence needed to reproduce the problem (delete code that does not affect it). – Phrogz Apr 15 '12 at 13:19
  • Was that the whole code? Because it seems to be working here. In any case: you are creating one parser and reusing it for every XML file. So that, for example, if the titles aren't defined in an XML file, it ends up inheriting them from the previous XML file -- which I really don't think is the behavior you'd want. Are you perhaps cleaning `@infodata` somewhere else? Doing something like `@infodata = {}` and not setting `:titles`? – Adiel Mittmann Apr 15 '12 at 16:21
  • Yes it is the whole code, I am cleaning the hash like this: @infodata.clear ... the xml I have provide you all have the title element. Which version of ruby do you use? –  Apr 15 '12 at 19:45
  • @AdielMittmann.. you got my thinking. I used the same parser-object for every xml file. So created I new parser-object for every new xml and it workt! cheers! –  Apr 16 '12 at 12:37

1 Answers1

0

You are doing this:

csv = CSV.generate_line(HEADERS.map {|h| @infodata[h] })
csv << "\n"

If for some reason the CSV.generate_line(HEADERS.map {|h| @infodata[h] }) returns nil, you will be trying to use the << method to a nil object, which is not defined.

You might want to add some conditions to avoid adding "\n" to csv if it is nil.

Nobita
  • 23,519
  • 11
  • 58
  • 87
  • the nil error is happing in the @infodata[:titles] << @content –  Apr 14 '12 at 22:35
  • @SHUMAcupcake Hey, that would have been nice to know :p You're getting better about including more details in your question, but still not enough. The error _message_ is nice, but the line number is just about as important. – Phrogz Apr 15 '12 at 00:42