0

I'm trying to download some URLs using wget. I get files with no problem except for this link Offensive-Security-ICQ and any other link on www.offensive-security.com.

I tried on both Linux and Windows with many trials and alot of search, but in vain.

I use this command "wget https://www.offensive-security.com/pwbonline/icq.html"

The resulted file shows this symbols and it is ANSI decoded enter image description here

How can I solve this problem??

Dr. MAF
  • 1,853
  • 1
  • 27
  • 45
  • What makes you think the download failed ? This might just be the content of the file you downloaded... Oh, and please avoid posting useless screenshots, this gives no additional information and costs a lot more (disk space, network...) – kebs Dec 10 '16 at 18:35
  • Could you edit question and paste the exact command-line you used to get that file. Could help. – kebs Dec 10 '16 at 18:37
  • I edited the question. – Dr. MAF Dec 10 '16 at 18:45

1 Answers1

1

For some reason, the server does not return the html page but a zipped version of it. The file you get is identified as a gzip compressed data:

$ file icq.html
icq.html: gzip compressed data, from Unix

So you can simply unzip it and you get the correct html page.

Why is the server doing that: not sure, but it's probably some default setting that has been left as is, so you can download faster.

How can one directly donwload the html content: probably by sending some common user agent and header, so that the server thinks that its a common web browser doing the request instead of a download tool.

This can be done with wget using some options, for example, this should work:

wget --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" https://www.offensive-security.com/pwbonline/icq.html
kebs
  • 6,387
  • 4
  • 41
  • 70