314

I am trying to use Wget to download a page, but I cannot get past the login screen.

How do I send the username/password using post data on the login page and then download the actual page as an authenticated user?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Señor Reginold Francis
  • 16,318
  • 16
  • 57
  • 73

11 Answers11

386

Based on the manual page:

# Log in to the server.  This only needs to be done once.
wget --save-cookies cookies.txt \
     --keep-session-cookies \
     --post-data 'user=foo&password=bar' \
     --delete-after \
     http://server.com/auth.php

# Now grab the page or pages we care about.
wget --load-cookies cookies.txt \
     http://server.com/interesting/article.php

Make sure the --post-data parameter is properly percent-encoded (especially ampersands!) or the request will probably fail. Also make sure that user and password are the correct keys; you can find out the correct keys by sleuthing the HTML of the login page (look into your browser’s “inspect element” feature and find the name attribute on the username and password fields).

9999years
  • 1,561
  • 13
  • 14
jarnoan
  • 4,309
  • 1
  • 19
  • 17
  • 10
    add --keep-session-cookies to the first command, or the second? – Felipe Alvarez Nov 09 '11 at 02:56
  • 4
    You don't need `-p` (`--page-requisites`) for this. – ændrük Jan 06 '12 at 17:24
  • 14
    It's also worth adding `--delete-after` to the first retrieval so you don't end up saving the result page from logging in. – Jim Hunziker Jan 02 '13 at 15:41
  • 2
    I am getting error `WGET64: missing URL` I put whole wget command in one line and removed `\\` – Mowgli Mar 28 '13 at 01:23
  • 1
    I don't know why this doesn't work for me here is my question on SOF. http://goo.gl/ySzst – Mowgli Apr 02 '13 at 15:45
  • 6
    --keep-session-cookies is needed for the first command only. It tells the first command to include session cookies when saving cookies to the file. The second command simply reads all cookies from the provided file. – wadim May 11 '14 at 17:09
  • adding --keep-session-cookies to the first command worked for me – gaoithe Nov 24 '14 at 12:28
  • this is working in confluence properly, but there should be replaced field name to os_password and os_username . – Znik Aug 21 '15 at 14:07
  • I also had to pass a `Referer` header in the authentication request: `--header 'Referer: https://wordpress.example.com/wp-login.php?loggedout=true'`. I also had to check the parameter names which, for me, were `log` and `pass`. – starfry Nov 01 '16 at 15:10
  • I updated the answer according to comments and edit proposals: Added `--keep-session-cookies` and `--delete-after` to first command and removed `-p` from the second one. I didn't try out the new version, so we'll see how it goes... – jarnoan Nov 03 '16 at 06:38
  • Wouldn't there need to be a wget for logout also to clean things up? I have tried the above code, but the cookies.txt file is always empty (except for some comments). Is there another way to get the cookies ? – Jim Merkel May 15 '23 at 19:15
117

You can log in via Firefox, and copy the needed headers afterwards:

screenshot

Use "Copy as cURL" in the Network tab of Firefox's browser developer tools and replace curl's flag -H with wget's --header (and also --data with --post-data if needed).

barlop
  • 12,887
  • 8
  • 80
  • 109
user
  • 23,260
  • 9
  • 113
  • 101
  • Awesome! Also pointed me to the option of using curl instead of wget, since it can do the same thing and I don't even need to change the parameters. – Jan Apr 08 '19 at 06:35
  • This worked for me, whereas `wget` with the correct cookie did not; I suspect the web service checks for multiple different GET headers, even seemingly unimportant ones like "User-Agent" or "Cache-Control." – Arthur Apr 20 '20 at 19:16
  • 1
    @Arthur for me this solution was the only one that worked. I tried to remove as much header data from the URL as possible and ended up with essentially the cookie data. So I suspect `wget` supplied the data in a wrong way. – Florian Blume May 19 '20 at 09:51
  • How can you just say "via browser", is that chrome or firefox? – barlop Oct 15 '22 at 07:33
  • This can also be done in Opera. In that case, two different options show up for me, "Copy as cURL (cmd)" and "Copy as cURL (bash)". In my case, after I chose the "Copy as cURL (cmd)" option, I needed to do the following changes as well: - replacing certain special characters in the parameter values (colons, ":") with their percent encodings (%3A for colons.) - remove the characters "^" peppered throughout the copied command. – J. D. May 16 '23 at 19:31
77

I directly gave cookies of an existing connection to wget with --no-cookies and the Cookie HTTP request header. In my case it was a Moodle university login where logging in looks more complex (using multiple requests with a login ticket). I added --post-data because it was a POST request.

For example, get all Moodle users list:

wget --no-cookies --header "Cookie: <name>=<value>" --post-data 'tab=search&name=+&personsubmit=Rechercher&keywords=&keywordsoption=allmine' https://moodle.unistra.fr/message/index.php
galoget
  • 722
  • 9
  • 15
baptx
  • 3,428
  • 6
  • 33
  • 42
  • 8
    Awesome tip. This is useful when you can access the cookie from your own machine and then use that from another headless machine from the command line. :) – Tuxdude Jul 27 '16 at 18:29
  • 4
    You can set multiple cookies at the same time also, --header "Cookie: access_token=IKVYJ;XSRF-TOKEN=5e10521d" – Phil C May 25 '18 at 13:28
32

I had the same problem. My solution was to do the login via Chrome and save the cookies data to a text file. This is easily done with this Chrome extension: Chrome cookie.txt export extension.

When you get the cookies data, there is also an example on how to use them with wget. A simple copy-paste command line is provided to you.

GoTrained
  • 158
  • 1
  • 7
  • 1
    unfortunately not applicable in automated scripting – Znik Aug 21 '15 at 13:49
  • 1
    The question doesn't specify automated scripting. This solution allows 99% of the work to be automated. – Will Sheppard Feb 06 '19 at 14:11
  • 1
    Unfortunately, Google must be too smart for this trick. I still get a login page. – Josiah Yoder Aug 22 '19 at 14:14
  • 1
    Of course, Google uses secret reCAPTCHAs... as I've seen so many places, using standard programmatic APIs is the most practical option in this case. – Josiah Yoder Aug 22 '19 at 14:38
  • The link you posted is unfortunately down. This one works: https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid To use with wget: wget --load-cookies /path/to/cookies.txt – Andreas Schwarz Jun 21 '22 at 14:15
10

I wanted a one-liner that didn't download any files; here is an example of piping the cookie output into the next request. I only tested the following on Gentoo, but it should work in most *nix environments:

wget -q -O /dev/null --save-cookies /dev/stdout --post-data 'u=user&p=pass' 'http://example.com/login' | wget -q -O - --load-cookies /dev/stdin 'http://example.com/private/page'

(This is one line, though it likely wraps on your browser)

If you want the output saved to a file, change -O - to -O /some/file/name.ext

galoget
  • 722
  • 9
  • 15
Caleb Gray
  • 3,040
  • 2
  • 21
  • 32
9

You don't need cURL to do POSTed form data. --post-data 'key1=value1&key2=value2' works just fine. Note: you can also pass a file name to wget with the POST data in the file.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
J. Piel
  • 101
  • 1
  • 2
8

If they're using basic authentication:

wget http://username:password@www.domain.com/page.html

If they're using POSTed form data, you'll need to use something like cURL instead.

Community
  • 1
  • 1
ceejayoz
  • 176,543
  • 40
  • 303
  • 368
7

A solution which uses lynx and wget.

Note: Lynx has to have been compiled with the --enable-persistent-cookies flag for this to work

When you want to use wget to download some file from a site which requires login, you just need a cookie file. In order to generate the cookie file, I choose lynx. lynx is a text web browser. First you need a configure file for lynx to save cookie. Create a file lynx.cfg. Write these configuration into the file.

SET_COOKIES:TRUE
ACCEPT_ALL_COOKIES:TRUE
PERSISTENT_COOKIES:TRUE
COOKIE_FILE:cookie.file

Then start lynx with this command:

lynx -cfg=lynx.cfg http://the.site.com/login

After you input the username and password, and select 'preserve me on this pc' or something similar. If login successfully, you will see a beautiful text web page of the site. And you logout. The in the current directory, you will find a cookie file named as cookie.file. This is what we need for wget.

Then wget can download file from the site with this command.

wget --load-cookies ./cookie.file http://the.site.com/download/we-can-make-this-world-better.tar.gz
alls0rts
  • 248
  • 3
  • 9
PokerFace
  • 811
  • 2
  • 9
  • 15
7

Example to download with wget on server a big file link that can be obtained in your browser.

In example using Google Chrome.

Login where you need, and press download. Go to download and copy your link.

enter image description here

Then open DevTools on a page where you where login, go to Console and get your cookies, by entering document.cookie

enter image description here

Now, go to server and download your file: wget --header "Cookie: <YOUR_COOKIE_OUTPUT_FROM_CONSOLE>" <YOUR_DOWNLOAD_LINK>

enter image description here

Alex Ivasyuv
  • 8,585
  • 17
  • 72
  • 90
  • This answer does not seem to scale well to Google -- where there are two pages of cookies! – Josiah Yoder Aug 22 '19 at 14:09
  • Of course, Google uses secret reCAPTCHAs... as I've seen so many places, using standard programmatic APIs is the most practical option in this case. – Josiah Yoder Aug 22 '19 at 14:38
1

I use this chrome extension. It'll give you the wget command for any download link you open.

Vahid
  • 6,639
  • 5
  • 37
  • 61
0

You can install this plugin in Firefox: https://addons.mozilla.org/en-US/firefox/addon/cliget/?src=cb-dl-toprated Start downloading what you want and click on the plugin. It gives you the whole command either for wget or curl to download the file on the serer. Very easy!

ady
  • 1,108
  • 13
  • 19