0

I am trying to scrape the information from a couple sites (mega.nz, openlaod.co) and the content is loaded dynamically so the code i am actuallu using doesn't work

 <?php

    require 'simple_html_dom.php';

    $ch = curl_init();
    curl_setopt($ch,  CURLOPT_URL,"https://openload.co/f/41I9Ak_QBxw/DPLA.mp4");
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

    $response = curl_exec($ch);
    curl_close($ch);

    echo $response;
    $html = new simple_html_dom();
    $html->load($response);


    foreach ($html->find('img[id=imagedisplay]') as $key ) {
        echo $key;
    }



?> 

when i use it on openload (like the example above) it redirects me to "https://oload.download/scraping/" being "/scraping" the folder where i have my script at.

Is there any javascript/jquery framework (or php) that i can use to scrape the content on the fly??

1 Answers1

0

It's not suitable for a large amount of scraping, but in the past when I've needed to grab some basic data from a dynamic web page I've found that Selenium works pretty well.

Depending on your stack of choice, I'd recommend looking into headless browsers. This way you can render a page in the background and parse the resulting HTML.

Mike B
  • 12,768
  • 20
  • 83
  • 109
  • Do you have any example on how to obtain the website? I've never worked with selenium before. Obtain the size of this file for example. https://mega.nz/#!bPgFDbAa!ND3F-bwTE6sVW1kERnVpOkIAacp4-yxAggLUbkbdnDY – Brayan Alejandro Monjaraz Rios Sep 01 '18 at 18:32
  • I don't have a code example, but the gist of what Selenium will do is that it will give you the ability to access the DOM of a given page, meaning that if you can find the element you want to scrape you can write a script to get the value. If you're struggling with the coding aspect I recommend Selenium IDE for Firefox/Chrome. Since you're using PHP, this might also help. https://stackoverflow.com/questions/16231984/how-to-export-selenium-output-to-php – Mike B Sep 03 '18 at 15:35