I have a page that has a list of people: their name, location, and a link to their profile. From this page, I want to create an array of hashes with each individual's information:
[{:name => 'name', :location => 'location', :link => 'link'}]
I am incorporating this into a scraper class, so what I have so far is:
require 'nokogiri'
require 'open-uri'
require 'pry'
class Scraper
def scrape_page('my_url')
doc = Nokogiri::HTML(open('my.url/'))
students = []
end
end
The plan is to set up the hashes and insert them into the students
array. Starting with the names here's how I go about it:
doc.css('div.name-header h4').children.map.to_a
Which returns:
[#(Text "Bob Charlie"),
#(Text "Glass Joe"),
#(Text "Piston Hurricane"),
#(Text "Big Bear Hugger"),
...]
I haven't figured out how to target just the text inside the XML element, because any method I use that actually returns the text does it so that the previous individual's last name and the following individual's first name are not separated by a space and it's one long string:
"Bob CharlieGlass JoePiston HurricaneBig Bear Hugger ... "
I managed to get around it like this:
doc.css('div.name-header h4').children.to_a.join("\n").split("\n").zip
Is there a way to simply target that text directly in order to pipe each one directly into an array?