0

I have an array called people of these objects:

Nokogiri::XML::Text:0x3fe41985e69c "CEO, Company_1"
Nokogiri::XML::Text:0x3fe4194dab74 "COO, Company_2 "
Nokogiri::XML::Text:0x3fe4195eb414 "CFO, Company_3"

I want to split the objects at the "," so I tried to do something like this:

companies = people.each do | company | 
  company.inner_text.match("/, (.*)/")
end

and:

occupations = people.each do | occupation | 
  occupation.inner_text.match("/(.*),/") 
end

match doesn't seem to extract the values I want from the object. I checked rubular.com, and it should work, but I'm getting the same string I put in: "CEO, Company_1" when it should be separated so that occupations = [CEO, COO, CFO] and companies = [Company_1, Company_2, Company_3].

How do I split these objects?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
William Stocks
  • 341
  • 1
  • 3
  • 14

1 Answers1

2

Why don't you split the text?

require 'nokogiri'

xml = '<x>
<people>CEO, Company_1</people>
<people>COO, Company_2</people>
<people>CFO, Company_3</people>
</x>
'

doc = Nokogiri::XML(xml)
people = doc.search('people')
companies = people.map do |company| 
  company.text.split(',')
end

pp companies

=> [["CEO", " Company_1"], ["COO", " Company_2"], ["CFO", " Company_3"]]

If you want to get rid of the leading spaces before the companies, use:

companies = people.map do |company| 
  company.text.split(/,\s*/)
end
=> [["CEO", "Company_1"], ["COO", "Company_2"], ["CFO", "Company_3"]]

Or:

companies = people.map do |company| 
  company.text.split(',').map(&:lstrip)
end
=> [["CEO", "Company_1"], ["COO", "Company_2"], ["CFO", "Company_3"]]

Or use map{ |s| s.sub(/^\s+/, '') } instead of the lstrip.

See "How to avoid joining all text from Nodes when scraping" also.

Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303