Ok, there are a couple of things wrong:
- sec.gov/Archives/edgar/data/1475481/0001475481-09-000001.txt is NOT XML, so Nokogiri will be of no use to you unless you strip off all the garbage from the top of the file, down to where the true XML starts, then trim off the trailing tags to keep the XML correct. So, you need to attack that problem first.
- You don't say what you want from the file. Without that information we can't recommend a real solution. You need to take more time to define the question better.
Here's a quick piece of code to retrieve the page, strip the garbage, and parse the resulting content as XML:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::XML(
open('http://sec.gov/Archives/edgar/data/1475481/0001475481-09-000001.txt').read.gsub(/\A.+<xml>\n/im, '').gsub(/<\/xml>.+/mi, '')
)
puts doc.at('//schemaVersion').text
# >> X0603