1

I'm not sure it's REXML or ruby issue. But this is happening when I work with REXML.

The program below should access elements of each xml file in the directory.

#!/usr/bin/ruby -w

require 'rexml/document'
include REXML

p "Current directory was: " + Dir.pwd

Dir.chdir("/home/askar/xml_files1") {

    p "Now we're in: " + Dir.pwd

    if File.exist?(Dir.pwd)

        xml_files = Dir.glob("ShipmentRequest*.xml")

        Dir.foreach(Dir.pwd) do |file|

            xmlfile = File.new(file)
            xmldoc = Document.new(xmlfile)

        end

    else
        puts "It's empty"
    end

}

When I run:

ruby import_xml.rb

Errors:

"Current directory was: /home/askar/Dropbox/rails_studio/xml_to_mysql"
"Now we're in: /home/askar/xml_files1"
There're 6226 files in the folder...
/home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/source.rb:148:in `read': Is a directory - . (Errno::EISDIR)
    from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/source.rb:148:in `initialize'
    from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/source.rb:14:in `new'
    from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/source.rb:14:in `create_from'
    from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/parsers/baseparser.rb:127:in `stream='
    from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/parsers/baseparser.rb:116:in `initialize'
    from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/parsers/treeparser.rb:9:in `new'
    from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/parsers/treeparser.rb:9:in `initialize'
    from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/document.rb:245:in `new'
    from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/document.rb:245:in `build'
    from /home/askar/.rvm/rubies/ruby-1.9.3-p429/lib/ruby/1.9.1/rexml/document.rb:43:in `initialize'
    from import_xml.rb:20:in `new'
    from import_xml.rb:20:in `block (2 levels) in <main>'
    from import_xml.rb:17:in `foreach'
    from import_xml.rb:17:in `block in <main>'
    from import_xml.rb:8:in `chdir'
    from import_xml.rb:8:in `<main>'

When I comment out:

#xmldoc = Document.new(xmlfile)

it's not giving errors.

Folder /home/askar/xml_files1 contains only 3 xml files.

I'm using Linux Mint Nadia and

ruby -v
ruby 1.9.3p429 (2013-05-15 revision 40747) [x86_64-linux]

If you noticed, for some reason, error shows ruby 1.9.1. Is this an issue?

Askar
  • 5,784
  • 10
  • 53
  • 96
  • Why are you using REXML? It's fallen to the wayside as far as XML parsers go. I'd strongly recommend using [Nokogiri](http://nokogiri.org). It's much faster and full featured. – the Tin Man May 31 '13 at 05:54
  • @the Tin Man, thanks for you precious advice. I'll have to check it out! :) – Askar May 31 '13 at 06:06

3 Answers3

2

I think @halfelf is correct here. The API docs say that Dir.foreach will iterate over every entry in the directory - and in Unix, that includes the two directories . and ...

A couple lines before your Dir.foreach call, you use glob to build an array of files called xml_files. What happens if you iterate over that in your loop instead?

dpassage
  • 5,423
  • 3
  • 25
  • 53
  • I got the same errors when substitute "foreach" with "glob". I even create new folder and put only 3 files there, so I know there's no a directory. I simplified the code and it worked for me, because I commented out xmldoc = Document.new(xmlfile), but xmlfile = File.new(file) does work. Please see the post with the updated (simplified) code. – Askar May 31 '13 at 05:06
  • You're still not fixing the problem, which is this line: `Dir.foreach(Dir.pwd) do |file|`. Try replacing it with `xml_files.each do |file|` – dpassage May 31 '13 at 05:39
1

Just a guess: Not everything returned by Dir.foreach(Dir.pwd) is a file that can be read. Some of them are directories.

halfelf
  • 9,737
  • 13
  • 54
  • 63
  • I know there're only files. – Askar May 31 '13 at 03:46
  • You sure? what about add a `begin rescue` block around `File.new` to see which file cannot be opened? – halfelf May 31 '13 at 03:50
  • Yes. I've even created empty directory and checked. I've noticed it's giving errors for the line xmldoc = Document.new(xmlfile). When I commented out from this line up to the end of the block, it's not giving errors. So the key is: xmldoc = Document.new(xmlfile) – Askar May 31 '13 at 03:59
0

Using Nokogiri, here's how I'd write this:

#!/usr/bin/ruby -w

require 'nokogiri'

DIRNAME = "/home/askar/xml_files1"

puts "Current directory is: #{ Dir.pwd }"
Dir.chdir(DIRNAME) do

  puts "Now in: #{ DIRNAME }"
  xml_files = Dir.glob("ShipmentRequest*.xml")

  if xml_files.empty?
    puts "#{ DIRNAME } is empty."
  else
    xml_files.each do |file|
      doc = Nokogiri::XML(open(file))
      # ... do something with the doc ...
    end
  end
end
the Tin Man
  • 158,662
  • 42
  • 215
  • 303