1

I need to parse this page https://www.petsonic.com/snacks-huesos-para-perros/ and recieve information from every item(name,price,image,etc.). The problem is that i don't know how to parse array of URL. If i were using 'open-uri' i would do something like this

require 'nokogiri'
require 'open-uri'


page="https://www.petsonic.com/snacks-huesos-para-perros/"


doc=Nokogiri::HTML(open(page))
links=doc.xpath('//a[@class="product-name"]/@href')

links.to_a.each do|url|
  doc2=Nokogiri::HTML(open(url))
  text=doc2.xpath('//a[@class="product-name"]').text
  puts text
end

However, i am only allowed to use 'Curb' and that's making me confused

tadman
  • 208,517
  • 23
  • 234
  • 262
PTaHHHa
  • 67
  • 1
  • 11
  • A) Use `curb` instead of `open-uri`. B) Put these into an array. Hint: Use `map` instead of `each`, that yields what you need. – tadman Aug 14 '19 at 17:56

1 Answers1

1

You can use the curb gem

gem install curb

Then in your ruby script

require 'curb'
page = "https://www.petsonic.com/snacks-huesos-para-perros/"
str = Curl.get(page).body
links = str.scan(/<a(.*?)<\/a\>/).flatten.select{|l| l[/class\=\"product-name/]}
inner_text_of_links = links.map{|l| l[/(?<=>).*/]}
puts inner_text_of_links

The hard part of this was the regex let's break it down. To get the links we just scan the string for <a> tags, then get those into an array and flatten them into one array.

str.scan(/<a(.*?)<\/a\>/)

Then we select the items which match our pattern. We are looking for the class you specified.

.select{|l| l[/class\=\"product-name/]}

Now to get the innertext of the tag we just map it using a look behind regex

inner_text_of_links = links.map{|l| l[/(?<=>).*/]}
lacostenycoder
  • 10,623
  • 4
  • 31
  • 48