How to go through array of URLs using Curb

Question

I need to parse this page https://www.petsonic.com/snacks-huesos-para-perros/ and recieve information from every item(name,price,image,etc.). The problem is that i don't know how to parse array of URL. If i were using 'open-uri' i would do something like this

require 'nokogiri'
require 'open-uri'


page="https://www.petsonic.com/snacks-huesos-para-perros/"


doc=Nokogiri::HTML(open(page))
links=doc.xpath('//a[@class="product-name"]/@href')

links.to_a.each do|url|
  doc2=Nokogiri::HTML(open(url))
  text=doc2.xpath('//a[@class="product-name"]').text
  puts text
end

However, i am only allowed to use 'Curb' and that's making me confused

A) Use `curb` instead of `open-uri`. B) Put these into an array. Hint: Use `map` instead of `each`, that yields what you need. — tadman, Aug 14 '19 at 17:56

lacostenycoder · Accepted Answer · 2019-08-14T20:47:04.757

1

You can use the curb gem

gem install curb

Then in your ruby script

require 'curb'
page = "https://www.petsonic.com/snacks-huesos-para-perros/"
str = Curl.get(page).body
links = str.scan(/<a(.*?)<\/a\>/).flatten.select{|l| l[/class\=\"product-name/]}
inner_text_of_links = links.map{|l| l[/(?<=>).*/]}
puts inner_text_of_links

The hard part of this was the regex let's break it down. To get the links we just scan the string for <a> tags, then get those into an array and flatten them into one array.

str.scan(/<a(.*?)<\/a\>/)

Then we select the items which match our pattern. We are looking for the class you specified.

.select{|l| l[/class\=\"product-name/]}

Now to get the innertext of the tag we just map it using a look behind regex

inner_text_of_links = links.map{|l| l[/(?<=>).*/]}

edited Aug 14 '19 at 20:47

answered Aug 14 '19 at 20:36

lacostenycoder

10,623
4
31
48

very comprehensive answer. Thank you,sir :) – PTaHHHa Aug 14 '19 at 20:43
@PTaHHHa thanks but see updated version. I checked on the link and should be about 26 urls. We don't need `"|'` in the regex so I removed it. – lacostenycoder Aug 14 '19 at 20:45

How to go through array of URLs using Curb

1 Answers1