I've written a website scraper for use in a project.
I'm controlling Firefox through Sahi using Mink to visit each site and interact with any elements where necessary. I've managed to get this working perfectly on all sites I've tried except for one...
I'm trying to get the markup from https://www.o2.co.uk/shop/phones/
I'm using the exact same code for this page, as I have for all others:
// Configure driver
$this->driver = new \Behat\Mink\Driver\SahiDriver('firefox',
new \Behat\SahiClient\Client(
new \Behat\SahiClient\Connection(null, CRAWL_SERVER, 9999)
)
);
// Init session:
$this->session = new \Behat\Mink\Session($this->driver);
// Start session:
$this->session->start();
// Open the url
$this->session->visit($config['url']);
// Get the markup from the page
$markup = $this->session->getPage()->getContent();
When I use this code to attempt to get the markup from https://www.o2.co.uk/shop/phones/ Mink just seems to hang, waiting for something to happen.
It would seem that maybe something on this page is preventing either Sahi or Mink from returning the markup. I've also tried running other functions instead of getContent()
, such as $this->session->wait(2000);
and attempting to search through getPage
using the find
command.
If anyone has any idea as to why this is happening I would be very interested in finding out why and how I can make this work.
tl;dr Why is Mink/Sahi timing out on this site?