6

I am developing a startpage where users can add links to the page by using a formular. They can add name, url, description and upload an image.

I want to automate the process of uploading an image, the image should be captured automatically. My script should take a screenshot of the website which the user entered in url. I know I can take screenshots of html elements by using html2canvas.


Approach 1

My first approach was to load the external website to an iframe, but this does not work because some pages are restricting this, e.g. even the iframe tutorial on w3schools.com does not work and I get Refused to display 'https://www.w3schools.com/' in a frame because it set 'X-Frame-Options' to 'sameorigin'.

HTML

<div id="capture" style="padding: 10px; color: black;">
    <iframe src="https://www.w3schools.com"></iframe>
</div>

Approach 2

My next approach was to make a call to my webserver, which loads the target website and returns the html to the client. This works, but the target site is not getting rendered properly, e.g. images are not loading. (see screenshot below)

Google

HTML

<div id="capture" style="padding: 10px; color: black;"></div>

JS

var testURL = "http://www.google.de";

$.ajax({
    url: "http://server/ajax.php",
    method: "POST",
    data: { url: testURL},
    success: function(response) {

       $("#capture").html(response);
       console.log(response);

        html2canvas(document.querySelector("#capture")).then(
            canvas => {
                document.body.appendChild(canvas);
            }
        );
   }
});

PHP

if (!empty($_POST['url'])) {
    $url = filter_input(INPUT_POST, "url");
}

$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
//curl_setopt(... other options you want...)

$html = curl_exec($c);

if (curl_error($c))
    die(curl_error($c));

// Get the status code
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);

curl_close($c);
echo $html;

Is it possible to achieve this?


Update

I managed to load some pictures by changing my ajax, but they are not rendered by html2canvas.??

var testURL = "http://www.google.de";

$.ajax({
    url: "http://server/ajax.php",
    method: "POST",
    data: { url: testURL},
    success: function(response) {

       response = response.replace(/href="\//g, 'href="'+testURL +"/");
       response = response.replace(/src="\//g, 'src="'+testURL +"/");
       response = response.replace(/content="\//g, 'content="'+testURL +"/");

       $("#capture").html(response);
       console.log(response);

        html2canvas(document.querySelector("#capture")).then(
            canvas => {
                document.body.appendChild(canvas);
            }
        );
   }
});

Result

enter image description here

Result Canvas

enter image description here

Black
  • 18,150
  • 39
  • 158
  • 271

3 Answers3

3

I love php, but for screenshots I found that using phantomjs provide the best results

Example file screenshot.js

var page = require('webpage').create();
page.open('https://stackoverflow.com/', function() {
  page.render('out.png');
  phantom.exit();
});

Then from the shell:

phantomjs screenshot.js 

Or from php:

exec("phantomjs screenshot.js &");

The goal here is to generate the js file from php.

Result in a file called out.png in the same folder. This is a full height page screenshot.

Example output

We can also take good captures with Firefox from the command line. This require X anyway.

firefox -screenshot test.png  http://www.google.de --window-size=1280,1000

Example output

NVRM
  • 11,480
  • 1
  • 88
  • 87
1

Not in pure php. Nowadays major number of sites generates content dynamically with js. It can be rendered only by browsers, but good news - there is something called phantomjs - browser without UI. It can do job for You, even they have working example in their tutorials which I succesfully implemented few years ago with small knowledge of javascript. There is alternative library called a nightmarejs - I know this only from friends opinion which says that it's simpler than phantom, but I won't guarantee to You that it won't be a nightmare - personally I hadn't use it.

bigwolk
  • 418
  • 4
  • 17
  • Please note that link-only answers are strongly discouraged. If you cannot provide relevant examples of how to implement recommended solutions in your answer, then this should really be a comment. – Patrick Q May 07 '18 at 14:26
  • 1
    Do take into account that phantomjs is no longer in active development because (among other reasons) Headless Chrome is now a thing (https://developers.google.com/web/updates/2017/04/headless-chrome) – Sergiu Paraschiv May 07 '18 at 14:27
  • @PatrickQ - it's not link only answer. I gave there two possible technologies to achieve the goal. Question wasn't about full implementation, but about way to achieve it. Why many of users doesn't understand that for some questions code isn't necessary. When people asking for a way to do something - not for do the thing for them. – bigwolk May 07 '18 at 15:49
  • @SergiuParaschiv I didn't know headles chrome, so I didn't mention it :D +1 for You. I think it could be one of the answers - not only comment under mine ;) – bigwolk May 07 '18 at 15:52
1

It is possible, but if you want an screenshot you need something like a browser that render the page for you. The iframe approach go in that way. But iframe is the page itself. If you want a .jpg , .png or something like that, the best way in my opinion is using wkhtmltoimage. https://wkhtmltopdf.org/. The idea is that you install Qt WebKit rendering engine in your server, just as you install a browser in your server, this render the page and save the final result in a file. When some user submit a url, you pass it as argument to wkhtmltopdf then you could have an image of that url. The basic use could be somethig like

wkhtmltoimage http://www.example1.com /var/www/pages/example1.jpg

you should run that statement in bash, from php could be:

 <?php
exec('wkhtmltoimage http://www.example1.com /var/www/pages/example1.jpg');
?>

Keep in mind that wkhtmltoimage execute css, javascript.., everything. Just like browser.

Emeeus
  • 5,072
  • 2
  • 25
  • 37