0

I used headless mode to extracta webpage and here's the related inner HTML part of the output.

<div class="product__aside">
\t\t\t\t<div class="slider-pdp">
\t\t\t\t\t<div class="slider__clip">
\t\t\t\t\t\t<div class="slides slick-initialized slick-slider slick-dotted" role="toolbar">
<div aria-live="polite" class="slick-list draggable" style="padding: 0px 24.47%;"><div class="slick-track" role="listbox" style="opacity: 1; width: 6010px; transform: translate3d(-1202px, 0px, 0px);"><div class="slide slick-slide slick-cloned" data-slick-index="-2" aria-hidden="true" tabindex="-1" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_04--IMG_600-1812358633.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_04--IMG_1365--489680014.jpg"> 
\t</div>
</div><div class="slide slick-slide slick-cloned" data-slick-index="-1" aria-hidden="true" tabindex="-1" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_05--IMG_600-251567441.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_05--IMG_1365--146353341.jpg"> 
\t</div>
</div><div class="slide slick-slide slick-current slick-active slick-center" data-slick-index="0" aria-hidden="false" tabindex="-1" role="option" aria-describedby="slick-slide00" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_LEAD--IMG_600--951538759.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_LEAD--IMG_1365--973725436.jpg"> 
\t</div>
</div><div class="slide slick-slide" data-slick-index="1" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide01" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_01--IMG_600--1234110023.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_01--IMG_1365-140785407.jpg"> 
\t</div>
</div><div class="slide slick-slide" data-slick-index="2" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide02" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_02--IMG_600--150275930.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_02--IMG_1365-1432102351.jpg"> 
\t</div>
</div><div class="slide slick-slide" data-slick-index="3" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide03" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_03--IMG_600--102741357.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_03--IMG_1365-1955701010.jpg"> 
\t</div>
</div><div class="slide slick-slide" data-slick-index="4" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide04" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_04--IMG_600-1812358633.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_04--IMG_1365--489680014.jpg"> 
\t</div>
</div><div class="slide slick-slide" data-slick-index="5" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide05" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_05--IMG_600-251567441.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_05--IMG_1365--146353341.jpg"> 
\t</div>
</div><div class="slide slick-slide slick-cloned slick-center" data-slick-index="6" aria-hidden="true" tabindex="-1" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_LEAD--IMG_600--951538759.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_LEAD--IMG_1365--973725436.jpg"> 
\t</div>
</div><div class="slide slick-slide slick-cloned" data-slick-index="7" aria-hidden="true" tabindex="-1" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_01--IMG_600--1234110023.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_01--IMG_1365-140785407.jpg"> 
\t</div>
</div></div></div>

From this I need to get the src values which contains "PRODUCT_LEAD" string in it. In order to dos so I wrote following code and if I dd($imgs) it returns length as 10. but It didn't return src values which was in the for loop. $pageBody is the inner html of the web page.

                            $doc = new DOMDocument;
                            $doc->preserveWhiteSpace = false;
                            $doc->strictErrorChecking = false;
                            $doc->recover = true;

                            ini_set('user_agent', 'My-Application/2.5');
                            libxml_use_internal_errors(true);
                            $doc->loadHTML($pageBody);
                            $xpath = new \DOMXPath($doc);
                            $imgs  = $xpath->query('//*[@class="slide__image"]');
                            foreach($imgs as $img)
                            {
                                $imgurl = $img->getAttribute('src');
                            }
                            dd($imgurl); // This returns nothing
Devin Y
  • 137
  • 2
  • 13

3 Answers3

1

Try $imgs = $xpath->query('//*[@class="slide__image"]/img/@src[contains(., "PRODUCT_LEAD")]');

The part in square brackets is the "predicate" that determines which elements to select. The . refers to the current node.

Forensic_07
  • 1,125
  • 1
  • 6
  • 10
0

Try this code:

$imgurl = [];

for($x = 0; $x < $imgs->length; $x++) {
    $imgurl[] = $imgs->item($x)->getAttribute('src');
}
Robert
  • 372
  • 1
  • 4
  • 11
0
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;

ini_set('user_agent', 'My-Application/2.5');
libxml_use_internal_errors(true);
$doc->loadHTML($pageBody);
$xpath = new \DOMXPath($doc);
$imgs  = $xpath->query('//*[@class="slide__image"]/img/@src');
$imgurl=[];
foreach($imgs as $img)
{
    if(str_contains($img->nodeValue,'PRODUCT_LEAD'))
    {
       $leadImage = $img->nodeValue;
    }
}

Instead of getAttibute() I modified the code like this. And this works fine. But I would like to know if I can get this url straight from query() something like //img[@src(contains())]

Devin Y
  • 137
  • 2
  • 13