1

I need to extract a json that is inside a page, more precisely in div inside the tag data-react-props

<div data-react-class="GamePageHeader" data-react-props="{"id":1274,"slug":[...]}

How can I extract the json inside the react props? I believe that with HtmlDomParser I can't do this.


Edit:

Thanks to Prateek's reply I wrote the code:

use Symfony\Component\DomCrawler\Crawler;
use Symfony\Component\CssSelector\CssSelectorConverter;

$html = file_get_html('https://www.igdb.com/games/simcity--2');
$crawler = new Crawler($html);
$data = $crawler->filter('div[data-react-class="GamePageHeader"]')->attr('data-react-props');

print $data;

But I always get the error

LOG.error: Expecting a DOMNodeList or DOMNode instance, an array, a string, or null, but got "simple_html_dom".

I have installed packages symfony/css-selector and symfony/dom-crawler in Laravel 5.8

Stephan Vierkant
  • 9,674
  • 8
  • 61
  • 97
LegoLiam
  • 97
  • 3
  • 11

1 Answers1

2

Yes, you cannot do this using HtmlDomParser. However, this can be done using symfomy's dom crawler.

Step 1: Install it using composer require symfony/dom-crawler, also install css-selector using composer require symfony/css-selector

Step 2: Get the html and instantiate the crawler

$html = file_get_contents('https://www.igdb.com/games/simcity--2');
$crawler = new Crawler($html); // same as HtmlDomParser::str_get_html( $html);

Step 3: Use filter to crawl through the body and get your required dom elements and use attr to get value insede the tag

$data = $crawler->filter('div[data-react-class="GamePageHeader"]')->attr('data-react-props');
Prateek
  • 834
  • 9
  • 28
  • To make the scrape in an url, do I have to put it like this? `$html = file_get_html('https://www.igdb.com/games/simcity--2')` Unfortunately I don't find much documentation. ps: I'm using Laravel if it can be useful – LegoLiam Aug 17 '19 at 19:54
  • Yes you do need to use that line. also make sure you have added `use Symfony\Component\DomCrawler\Crawler;` to the class you are using it in. You will also need to require css selector package from symfony for the css selector in filter to work. `composer require symfony/css-selector` I;ve updated the answer to reflect that. Feel free to check out the [documentation](https://symfony.com/doc/current/components/dom_crawler.html#usage). – Prateek Aug 17 '19 at 20:11
  • I updated the question with your help but I still have problems – LegoLiam Aug 17 '19 at 22:33
  • 1
    I saw you update, you can fix this by updating the initiation of $html variable, this can be fixed by either using `$html = file_get_html('https://www.igdb.com/games/simcity--2')->plaintext;` or an easier way would be to just use file_get_contents instead like `$html = file_get_contents('https://www.igdb.com/games/simcity--2');`, I also updated my answer to reflect that. – Prateek Aug 18 '19 at 18:57