0

I have a page that has a list of people: their name, location, and a link to their profile. From this page, I want to create an array of hashes with each individual's information:

[{:name => 'name', :location => 'location', :link => 'link'}]

I am incorporating this into a scraper class, so what I have so far is:

require 'nokogiri'
require 'open-uri'
require 'pry'

class Scraper
    def scrape_page('my_url')
    doc = Nokogiri::HTML(open('my.url/'))
    students = []
    end
end

The plan is to set up the hashes and insert them into the students array. Starting with the names here's how I go about it:

doc.css('div.name-header h4').children.map.to_a

Which returns:

[#(Text "Bob Charlie"),
 #(Text "Glass Joe"),
 #(Text "Piston Hurricane"),
 #(Text "Big Bear Hugger"),
   ...]

I haven't figured out how to target just the text inside the XML element, because any method I use that actually returns the text does it so that the previous individual's last name and the following individual's first name are not separated by a space and it's one long string:

"Bob CharlieGlass JoePiston HurricaneBig Bear Hugger  ... "

I managed to get around it like this:

doc.css('div.name-header h4').children.to_a.join("\n").split("\n").zip

Is there a way to simply target that text directly in order to pipe each one directly into an array?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Kenney G
  • 11
  • 2
  • 1
    have you tried `map(&:text)` ? – engineersmnky Feb 12 '20 at 19:13
  • Whoa that did it! Where could I have found that information for future reference? I've already gone over the tutorials and cheat sheet and everything. Thank you, that's a big help. – Kenney G Feb 12 '20 at 21:26
  • 1
    [Docs for Nokogiri::XML::Node#content](https://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/Node#content-instance_method) `#text` is an alias for `#content`. [Cheat Sheet Working with a Nokogiri::XML::Node](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet#working-with-a-nokogirixmlnode) see "## Content / Children" at the bottom of that section – engineersmnky Feb 12 '20 at 21:31
  • When asking, we need to see the minimum code and HTML or XML in the question that demonstrates the problem. See "[MCVE](https://stackoverflow.com/help/minimal-reproducible-example)". – the Tin Man Feb 13 '20 at 02:01

0 Answers0