1

Here is my code:

<?php
    $url = "http://www.sportsdirect.com/adidas-goletto-mens-astro-turf-trainers-263244?colcode=26324408";   
    libxml_use_internal_errors(true); 
    $doc = new DOMDocument();
    $doc->loadHTMLFile($url);

    $xpath = new DOMXpath($doc);

    $name  = $xpath->query('//span[@id="ProductName"]')->item(0)->nodeValue;

    echo $name;

?>  

With this code i have no problems. I am able to parse the link and get the name of the product. The problem comes when i try to parse other link. If i try to parse: http://www.sportsdirect.com/playboy-100ml-eau-de-toilette--754217?colcode=75421790

I get this error:

Warning: DOMDocument::loadHTMLFile(http://www.sportsdirect.com/playboy-100ml-eau-de-toilette--754217?colcode=75421790): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found in /public_html/test.php on line 5

I get this error because when i try to parse this link the website which i am parsing is checking my browser cookies and if i do not have cookie with name ChosenSite set to www it is redirecting me instantly to parse data from http://bg.sportsdirect.com/playboy-100ml-eau-de-toilette--754217?colcode=75421790 which is unable to provide this data and from there i get the problem not getting the name.

So my question is:

How can i set up cookies or can i use another method to parse this link ?

Thanks in advance!

Venelin
  • 2,905
  • 7
  • 53
  • 117

1 Answers1

0

Either use cURL and analyze your data afterwards (that is, load the data from the curl response)
Or - if you're willing to spend some effort while learning another programming language (Python, that is) - have a look at Scrapy which has a built-in functionality for many many tasks connected with scraping in general.

Community
  • 1
  • 1
Jan
  • 42,290
  • 8
  • 54
  • 79