57

When I run this command:

wget --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0"  http://yahoo.com

...I get this result (with nothing else in the file):

<!-- hw147.fp.gq1.yahoo.com uncompressed/chunked Wed Jun 19 03:42:44 UTC 2013 -->

But when I run wget http://yahoo.com with no --user-agent option, I get the full page.

The user agent is the same header that my current browser sends. Why does this happen? Is there a way to make sure the user agent doesn't get blocked when using wget?

Joe Mornin
  • 8,766
  • 18
  • 57
  • 82

3 Answers3

89

It seems Yahoo server does some heuristic based on User-Agent in a case Accept header is set to */*.

Accept: text/html

did the trick for me.

e.g.

wget  --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0"  http://yahoo.com

Note: if you don't declare Accept header then wget automatically adds Accept:*/* which means give me anything you have.

Filip
  • 3,002
  • 1
  • 26
  • 37
  • It isn't just Yahoo that filter out requests like this - Something to always be aware of! – user3791372 Jan 15 '17 at 11:42
  • Edit your `~/.bash_profile` by adding `alias mwget='wget --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0"'` so you don't have to deal with the params each time. – Petr Nagy Feb 19 '20 at 20:31
36

I created a ~/.wgetrc file with the following content (obtained from askapache.com but with a newer user agent, because otherwise it didn’t work always):

header = Accept-Language: en-us,en;q=0.5
header = Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
header = Connection: keep-alive
user_agent = Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0
referer = /
robots = off

Now I’m able to download from most (all?) file-sharing (streaming video) sites.

erik
  • 2,278
  • 1
  • 23
  • 30
1

You need to set both the user-agent and the referer:

 wget  --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" --referer  connect.wso2.com http://dist.wso2.org/products/carbon/4.2.0/wso2carbon-4.2.0.zip
Jasper
  • 11,590
  • 6
  • 38
  • 55
bags307
  • 11
  • 1