13

I'm trying to generate a PDF using WKHTMLTOPDF that requires me to first log in. There's some on this on the internet already but I can't seem to get mine working. I'm in Terminal - nothing fancy.

I've tried (among a whole lot of other stuff):

/usr/bin/wkhtmltopdf --post username=myusername --post password=mypassword "URL to Generate" test.pdf

/usr/bin/wkhtmltopdf --username myusername --password mypassword "URL to Generate" test.pdf

/usr/bin/wkhtmltopdf --cookie-jar my.jar --post username=myusername --post password=mypassword "URL to Generate Cookie For"

username and password are both the id and the name of the input fields on the form. I am getting the my.jar file to show up, but nothing is written to it.

Specific questions:

  1. Should I be specifying the login page and/or form action anywhere?
  2. the --cookie-jar parameter has been mentioned in various places (both as being needed and otherwise). Should that be necessary, how does it work? I've created the my.jar file but how do I use it again? Referencing:

http://code.google.com/p/wkhtmltopdf/issues/detail?id=356


EDIT:

Surely someone has done this successfully? A good way to showcase an example might if someone is willing to get it to work on some popular website that requires login credentials to eliminate a potential variable.

Chords
  • 6,720
  • 2
  • 39
  • 61

4 Answers4

11

Every login form will be different for every site. What you're going to want to do is determine what all you need to pass in to that login form's target by reading the HTML on the page (which you're probably aware of). It may take an additional hidden field on top of the username/password fields to prevent cross site request forgeries.

The cookie jar parameter is a file that it stores the cookies it gets back from the webserver in. You need to specify it in the first request to the login form, and in subsequent requests to continue to use the cookie/session information that the webserver will have given you back after logging in.

So to sum it up:

  1. Look and see if there are any additional parameters on the page required.
  2. Make sure the URL you are submitting to is the same as the ACTION attribute of the form element on that page.
  3. Use the --cookie-jar parameter in both the login request and the second content request.
  4. The syntax for the --post parameters are --post username user_name_value --post password password_value
hsanders
  • 1,913
  • 12
  • 22
  • Thanks, hsanders. Even though I wound up taking another route your answer looks solid. Thanks for taking the time to reply! – Chords May 03 '12 at 13:19
  • @Chords No problem. I've used wkhtmltopdf a couple of times before. I think for a more complicated case, like the one you described it's a bit of a pain to use... I'm not sure how it would deal with the redirects you mentioned in your followup, never had to deal with that. – hsanders May 04 '12 at 18:52
9

I think the form I'm trying to log in to is too complex. It's secure, sets three cookies, redirects twice, and posts a number of other variables outside of the username and password, one of which requires a cookie value (I even tried concatenating the value into the post variable, but no luck). This is probably a pretty rare issue - by no means the fault of WKHTMLTOPDF.

I wound up using CURL to log in and write the page to a local file, then ran WKHTMLTOPDF against that. Definitely a solid work around for anyone else having a similar issue.


Edit: CURL, if interested:

curl_setopt($ch, CURLOPT_HEADER, 1); # Change to 1 to see WTF
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
Chords
  • 6,720
  • 2
  • 39
  • 61
  • 1
    Would have been much more helpful to outline this cURL syntax which did the trick for you. – Ifedi Okonkwo Sep 19 '16 at 17:24
  • Hi Ifedi, not sure my specific implementation actually will be helpful for your use case (it's the post string that's specific to my needs, and implemented via PHP) but I added it, so hopefully it helps. – Chords Sep 20 '16 at 15:22
4

You might be interested in trying to render to PDF with phantomjs.

phantomjs rasterize.js http://blah.com/ webgl.pdf

You can find rasterize.js here. Basically, you write some javascript to log in on the login page, then you do the PDF creation.

However, the output is not the same as wkhtmltopdf. You could just save the HTML to a file, and then render with wkhtmltopdf if the phantomjs PDF output is too awful.

kanzure
  • 1,331
  • 11
  • 15
0

I just got it working in the Terminal! Logging in on a Wordpress website and once logged in render a PDF from the webpage. You need to find ALL the input fields on the login page, also the hidden ones. You can find them in Firefox, right click into the field and > inspect

Assume our login page is https://www.mywebsite.com/login/ There were 2 visible input fields here

<input type="text" id="user_login" name="log" value="">
<input type="password" id="user_pass" name="pwd" value="">

Than look for the submit button

<input type="submit" id="wp-submit" name="wp-submit" class="button-primary mepr-share-button" value="Log In">

Underneath there were 3 more HIDDEN fields

<input type="hidden" name="mepr_process_login_form" value="true">
<input type="hidden" name="mepr_is_login_page" value="true">
<input type="hidden" name="redirect_to" value="https://www.mywebsite.com/members/">

So now we can POST these values, no need for the redeirect_to

wkhtmltoimage --cookie-jar my.jar --post log insertLoginHere --post pwd insertPasswordHere --post mepr_process_login_form true --post mepr_is_login_page true --disable-javascript https://www.mywebsite.com/login/ dummy.jpg

The dummy image shows me I am logged in and my credentials are written in my.jar So now I can happily render the page(s) to PDF's, whilst logged in

wkhtmltopdf --cookie-jar my.jar --disable-javascript --print-media-type https://www.mywebsite.com/mymembercontenturl/ members.pdf

Fantastic!

FFish
  • 10,964
  • 34
  • 95
  • 136