Using Nokogiri to scrape element from a web page with ruby

Question

Description Have a web page which loading into Mechanize ruby gem. Problem is that I can see the HTML fragment in developer tools but not having any luck extracting the element and the associated text.

The HTML fragment on the page, nested in a series of <div> tags is something like:

<button stid="FLIGHTS_DETAILS_AND_FARES" data-test-id="select-link" data-stid="FLIGHTS_DETAILS_AND_FARES-index-1" class="uixtk-card-link" type="button"><span class="is-visually-hidden">Select and show fare information for flight, departing at 6:40 am from Somewhere, arriving at 12:35 pm in Somewhere, Priced at $683 Return per traveller.  6 hours 25 minutes total travel time, One stop, Stopover for 1 hour 40 minutes in Another Place.</span></button>

I'm trying to extract the text content with description, times, length of trip and cost.

The simple ruby test code is shown below (noting that actual site details are removed) which simple sets up Mechanize and invokes PRY so can test it out.

require 'mechanize'
require 'nokogiri'
require 'open-uri'
require 'pry'

class Mecho 

    site_url = 'https://host.com'

    agent = Mechanize.new 

    page = agent.get(site_url)

    binding.pry
end

Given the HTML code fragment above what I've tried is to locate the button via

page.css("button")

Whilst this locates a lot of the buttons tagged on the page it seems unable to find the fragment I'm after.

So then I tried locating the element identified via the <span class="is-visually-hidden"> which gives me 17 entries but still not finding the one I'm looking for.

I've checked out the Nokogiri cheat sheet to see if there are other methods I'm missing. No luck so far.

If the fragment in generated by Javascript, Nokogiri won't find it. You will need to use something else like PhatonJS (now retired) or a similar setup. Also if you want more specific advice, you will need to provide an actual page link. — Casper, May 12 '23 at 23:34

score 0 · Answer 1 · answered May 13 '23 at 21:42

0

Something like

page.css("button.uixtk-card-link")

or

 page.css("button.is-visually-hidden")

should find it.

If the page is dynamic, then make sure that the element exists when Nokogiri is called.

answered May 13 '23 at 21:42

B Seven

44,484
66
240
385

Using Nokogiri to scrape element from a web page with ruby

1 Answers1