1

I'm trying to retrieve a website via curl/wget but instead of real content that I see with the browser I see ESI tags.

The URL is http://www.patagonia.com/home/?setCountryCode=US&setLocaleCode=en_US&setLocaleCodeSelect=en

<html xmlns="http://www.w3.org/1999/xhtml" class="no-js" lang="en"><head/><body onload="submitWait();true;"><esiU00003Aremove>

</esiU00003Aremove>



<esiU00003Acomment text=" ------------- begin html ---------- ">  

<esiU00003Acomment text=" --- CUSTOMIZE HEAD HERE --- ">

  <meta charset="utf-8"/>   <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>

    <title>Hang Tight! Routing to checkout...</title> ......

I already tried it via postman, only sending Accept and Connection cookies and I see normal HTML results. I'm not quite sure what is going on. Has anybody any idea on what header to send or what else to do for wget/curl to get the page correctly?

dsky
  • 173
  • 1
  • 11
  • What happens if you [get Postman to generate the CURL request for you](http://blog.getpostman.com/2015/08/31/writing-front-end-api-code-with-postman/)? Does it then show ESI tags? – ʰᵈˑ Feb 01 '17 at 10:44
  • @ʰᵈˑinteresting idea, tried that, does not work either, it generated `curl -X GET -H "Cache-Control: no-cache" -H "Postman-Token: 3f7093a7-c0f0-4675-edf9-12e0659d17c8" "http://www.patagonia.com/home/?setCountryCode=US&setLocaleCode=en_US&setLocaleCodeSelect=en"` – dsky Feb 01 '17 at 14:13

1 Answers1

0

Some websites don't like Curl's user-agent. Try:

curl -v -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0' 'http://www.patagonia.com/home/?setCountryCode=US&setLocaleCode=en_US&setLocaleCodeSelect=en'
ArtOfCode
  • 5,702
  • 5
  • 37
  • 56
hanshenrik
  • 19,904
  • 4
  • 43
  • 89
  • That leads to the same result, I also played with different user agents. – dsky Feb 01 '17 at 14:10
  • @dsky really? when i send the default curl user agent, i get blocked, but when i send that user-agent, i get the expected html. what is the output of `curl -v 'http://www.patagonia.com/home/?setCountryCode=US&setLocaleCode=en_US&setLocaleCodeSelect=en'` ? – hanshenrik Feb 01 '17 at 18:26
  • returns `curl -v 'http://www.patagonia.com/home/?setCountryCode=US&setLocaleC‌​ode=en_US&setLocaleC‌​odeSelect=en' * Hostname was NOT found in DNS cache * Trying 23.38.49.57... * Connected to www.patagonia.com (23.38.49.57) port 80 (#0) > GET /home/?setCountryCode=US&setLocaleC‌​ode=en_US&setLocaleC‌​odeSelect=en HTTP/1.1 > User-Agent: curl/7.38.0 > Host: www.patagonia.com > Accept: */* >` and never finishes – dsky Feb 02 '17 at 09:32
  • @dsky same as me then. from my IP, this is all fixed by adding the parameter `-H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0'` – hanshenrik Feb 02 '17 at 14:50
  • @hanshenrik weird, for me this doesn't change anything. My server is outside of the US, maybe they have a location check on their side, where do you curl from? – dsky Feb 02 '17 at 15:04
  • @dsky its the same result from Paris, France, and Tjome, Norway. – hanshenrik Feb 03 '17 at 00:16
  • @hanshenrik and by same result you refer to valid HTML? not tags? you actually get HTML without the text "CUSTOMIZE BODY HERE"? – dsky Feb 05 '17 at 16:47