-1

I'm trying to download XML file from the remote URL without success. I can see its content in the web browser, but can't download it through command line ( I can download it manually save as from the web browser ). I'm using wget:

wget -q -O test.xml https://example.com/test

I tried also using cURL without success.

Any idea?

Blackcoat77
  • 1,574
  • 1
  • 21
  • 31

2 Answers2

2

Remove -q and you'll see:

--2017-04-20 14:25:53--  https://example.com/test
Resolving example.com... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to example.com|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2017-04-20 14:25:53 ERROR 404: Not Found.

The URL is a 404 error page. Consequently text.xml is empty.

Then if you look at the manual:

   --content-on-error
       If this is set to on, wget will not skip the content when the
       server responds with a http status code that indicates error.

So:

wget -q --content-on-error -O test.xml https://example.com/test

… successfully downloads that resource.

It isn't valid XML though. The HTML 5 Doctype breaks it.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
  • I'm getting:" failed: Connection timed out. Retrying". Very strange – Blackcoat77 Apr 20 '17 at 14:03
  • @Blackcoat77 — That suggests it is a network problem between you and example.com. If it works in a browser: probably one related to proxy server configuration. – Quentin Apr 20 '17 at 14:04
  • If I type wget -d https://example.com/test to view default HTTP request header, I'm getting:"Certificates loaded: 174" . If I replace above mentioned URL with google I'm getting proper HTTP request header. – Blackcoat77 Apr 20 '17 at 14:06
  • Yes, I was assuming that the problem can be related to proxy config – Blackcoat77 Apr 20 '17 at 14:07
0

Try set a header

wget -q -O --header="Accept:text/xml,*/*"  test.xml https://example.com/test 
Fangxing
  • 5,716
  • 2
  • 49
  • 53