0

it has already discussed in this question ghost.py and proxy but the answer does not appear clear to me.

I have ghost installed and also have a proxy( the host is host=http://XXXXXXXX and port=7676).

from ghost import Ghost

url     = "https://www.google.co.uk"
host    = 'http://XXXXXXXX'
port    = 7676
ghost   = Ghost(wait_timeout=20)
ghost.set_proxy(type_='http', host=host, port=port)
ghost.open(url)
html = ghost.content

But I get an empty html string. I also tried with ghost.set_proxy(type_='https', host=host, port=port) but it fails (html string remains empty).

When home with no proxy, (and thus without the line ghost.set_proxy(type_='http', host=host, port=port)), it works.

I am udner a Win 64 Bits OS, with the proxy already settled in Advanced Settings. Would you have any idea of what I am missing?

Community
  • 1
  • 1
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87

1 Answers1

0

Most HTTP libraries use the proxy you specify in the corresponding environment variable; so running

export https_proxy="https://myproxy.com:7676" 

should help (on unixoid systems, at least).

When you use something like wget to access the URL through a proxy, and still get an empty reply, make sure your proxy is configured correctly, you're using the right credentials, and your remote server is not behaving strangely with things that are behind a proxy (which I wouldn't assume with google ;).

EDIT: just realized: You're only setting a HTTPS proxy; some elements on the page you're requesting might be fetched via HTTP (not S), and without a proxy for that, this might fail.

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94
  • But export is not a Python command? I am on Windows and already configure the proxy in advanced settings, but seems like Qt is not detecting it automatically. – Colonel Beauvel Feb 06 '15 at 11:34
  • That's why I recommended checking the proxy functionality with something "simpler" than ghost.py. Windows always lacks the most basic tools that everyone else's OSes come with, so most probably something based on `urllib2` in python would be a good test. – Marcus Müller Feb 06 '15 at 11:36
  • Nope, `urllib2` does not handle javascript and dynamic scrapping. The proxy works `with urllib2` with `proxy_support=urllib2.ProxyHandler(proxies); opener=urllib2.build_opener(proxy_support); urllib2.install_opener(opener)`. But that's not the point. I am talking aobut `ghost.py` under - I should have mentionned it - a WinOS. – Colonel Beauvel Feb 06 '15 at 11:43
  • Hm, the point is that I was assuming that your empty reply stems from something that goes wrong long before the first line of html, javascript or whatever would be received, as misconfigured proxies are the nr. 1 error reason when one of my customers has a problem with our automated downloader. – Marcus Müller Feb 06 '15 at 11:46
  • It is sure it comes from the proxy, I clearly said it in the question. – Colonel Beauvel Feb 06 '15 at 11:51
  • "Empty html string" doesn't really indicate where the error happens. – Marcus Müller Feb 06 '15 at 11:54
  • If no proxy, html contains something. If there is a proxy, html contains nothing. So there's something wrong with my proxy setting, but no error is raised. – Colonel Beauvel Feb 06 '15 at 12:09