1

Short intro: I'm the lead developer on a webcrawler project. It's a fairly mature project and on a daily basis we will execute around 3-70.000 individual crawlers. We have a mid-size serverfarm with each server running between 100-400 crawlers at a time.

The issue: We are seeing intermittent failures when accessing HTTPS/TLS sites, but only when the running server is a Windows 2008, we have no issues on our 2003 installations. Crawlers will be running, and suddenly none of them will be able to perform webrequests against HTTPS sites any more. They simply wait for their allotted timeperiod and then time out. They all fail in unison, new crawlers started while the issue is present will also fail.

The solution: Opening up an internet explorer instance on the affected server and going to a random HTTPS/TLS site will clear this issue up. Suddenly all the crawlers will stop getting any timeouts and simply work as they are supposed to. Sometimes more than a week will pass without a server experiencing this problem.

The question: Does anyone have a clue what is going on here? Our current solution is to launch an internet explorer daily on all windows 2008 servers and point it at https site, in the hopes of catching this before it becomes too much of an issue. That is very unsatisfying, and really won't scale properly.

Grubsnik
  • 918
  • 9
  • 25
  • Anything in the logs? What are you using to access HTTPS? WinHTTP? or? – Paul Zahra Mar 17 '15 at 09:22
  • Nothing, they just run for 4 mins (our default timeout) and then return without any response – Grubsnik Mar 17 '15 at 09:24
  • 4 mins = timeout for what? connection attempt? Your IIS logs contain nothing? Event logs nothing? nothing? :p – Paul Zahra Mar 17 '15 at 09:28
  • The servers we are crawling are not under our control. So I don't have any access to those logs. – Grubsnik Mar 17 '15 at 10:39
  • Have you tried manually connecting to a failed https url? If you aren't logging things I suggest you do with something like log4net, you may also be throwing silent exceptions so I suggest you use something like ELMAH. – Paul Zahra Mar 17 '15 at 14:38
  • We have extensive logging in the crawler it self, it gets a timeout exception. If we open up an IE instance and go to the site that is having trouble (or any site using TLS as their HTTPS protocol), the page will load without any trouble, and suddenly all our crawlers will become unstuck again. Opening up the site in a non-IE browser will load up the page correctly, but leave the crawlers stuck. – Grubsnik Mar 18 '15 at 07:27
  • When the server is handshaking is it actually using TLS or downgrading to SSL 3.0 ? or is it in fact resuming TLS sessions? http://en.wikipedia.org/wiki/Transport_Layer_Security#TLS_handshake – Paul Zahra Mar 18 '15 at 09:57
  • .net automatically falls back to SSL3.0 in case of a TLS failure. In case SSL also fails, we retry with SSL3.0, so we can fall back to SSL2.0 (there are webservers out there still running that!). For "clearing" this issue, we have to go to a website using TLS, if we point an IE instance to a webserver that is only offering SSL3, the problem will persist. – Grubsnik Mar 18 '15 at 10:56
  • This will hopefully provide an answer for you - look at the accepted answer... http://stackoverflow.com/questions/5653868/what-makes-this-https-webrequest-time-out-even-though-it-works-in-the-browser not a solution if servers have disabled SSL3 and only use TLS ... in that case maybe upgrade to TLS if SSL3 isn't available. – Paul Zahra Mar 18 '15 at 12:58
  • I did have a look at that earlier. It does not conform to my issue. Namely, the request will work just fine for a given timeperiod. Then suddenly fail. And once it fails, it fails for all crawlers performing https requests. Once we clear this error, it will clear the error for all crawlers, even though we never actually went and visited any of the websites that were having problems. – Grubsnik Mar 19 '15 at 09:24
  • do you get final solution ? any comments about it ? troubleshooting ? – Kiquenet Apr 06 '18 at 04:39

0 Answers0