1

<span class="tl">
<a href="/en/laravel/" class="c">laravel</a>, <span>goutte</span>, <a href="/en/html/">html</a>, <span>dom crawler</span>, <a href="/en/form/">form</a><span>guzzle</span>, <span>web scrapper</span>
</span>
<span class="tl">
<a href="/en/laravel/" class="c">laravel</a>, <span>goutte</span>, <a href="/en/elequent/">elequent</a>, <span>dom crawler</span>, <span>guzzle</span>, <a href="/en/orm/">orm</a>, <span>web scrapper</span>
</span>
<span class="tl">
<a href="/en/laravel/" class="c">laravel</a>, <a href="/en/goutte">goutte</a>, <a href="/en/php/">php</a>, <span>dom crawler</span>, <a href="/en/guzzle">guzzle</a>, <a href="/en/web-scrapper">web scrapper</a>
</span>

I want to extract the information in an array like this

array (size=3)
  0 => string 'laravel, html, form' (length=19)
  1 => string 'laravel, elequent, orm' (length=22)
  2 => string 'laravel, goutte, php, guzzle, web scrapper' (length=43)
coolsaint
  • 1,291
  • 2
  • 16
  • 27

1 Answers1

1

Try this code snippet here

<?php
ini_set('display_errors', 1);

$string=<<<HTML

<span class="tl">
<a href="/en/laravel/" class="c">laravel</a>, <span>goutte</span>, <a href="/en/html/">html</a>, <span>dom crawler</span>, <a href="/en/form/">form</a><span>guzzle</span>, <span>web scrapper</span>
</span>
<span class="tl">
<a href="/en/laravel/" class="c">laravel</a>, <span>goutte</span>, <a href="/en/elequent/">elequent</a>, <span>dom crawler</span>, <span>guzzle</span>, <a href="/en/orm/">orm</a>, <span>web scrapper</span>
</span>
<span class="tl">
<a href="/en/laravel/" class="c">laravel</a>, <a href="/en/goutte">goutte</a>, <a href="/en/php/">php</a>, <span>dom crawler</span>, <a href="/en/guzzle">guzzle</a>, <span>web scrapper</span>
</span>

HTML;

$domDocument = new DOMDocument();
$domDocument->loadHTML($string);

$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query('//span[@class="tl"]');
$data=array();
foreach($results as $result)
{
    $tempArray=array();
    $aNodes=$domXPath->query(".//a",$result);
    foreach($aNodes as $aNode)
    {
        if($aNode instanceof DOMElement)
        {
            $tempArray[]=$aNode->nodeValue;
        }
    }
    $data[]=  implode(", ", $tempArray);
}
print_r($data);
Sahil Gulati
  • 15,028
  • 4
  • 24
  • 42
  • I was expecting something like this for my symfony dom crawler script $links['tag'] = $crawler->filter('span.tl >:not(span)')->each(function ($node) { return $node->text(); }); – coolsaint May 29 '17 at 15:43
  • @coolsaint what is wrong with my post? can you tell me which thing is missing? – Sahil Gulati May 29 '17 at 15:45
  • I believe it solves the problem but in my case I am using crawler object by symfony domcrawler. – coolsaint May 29 '17 at 15:48