1

I have made a URL scanner that relies on cookielib and urllib2 to scan webpages. I have noticed that every time I reach 100 connections that the program just stops with no error. I am assuming the error is because I've hit 100 connections. I have tried various times on different domains and eventually the program will stop investigating links and stop once it hits 100 outgoing connections. How do you get around this error?

My setup code is as follows:

domain = "http://dotwhat.net"
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
html = opener.open(domain).read()
soup = BeautifulSoup(html)

I open up a new connection on line 4 of the code in a loop.

nobody
  • 71
  • 1
  • 1
  • 3
  • "I am assuming..." "How do you get around this error?" Step 1 is to stop assuming. You'll need to measure something like CPU use, memory use, IO use or something else to determine what is the **real** bottleneck. You must avoid assuming and start measuring. Specifically, you need to show real code. "I open up a new connection on line 4 of the code in a loop." doesn't make very much sense at all. – S.Lott Jul 06 '11 at 02:31
  • There is no CPU bottleneck, or memory bottleneck. Also on Windows 7 there is no connection limit. I've attempted clearing all connections to `cj`. I've even attempted deletion of connections to see if the error is fixed. Once the execution hits a problem it just stops and all connections end; open connections will go from 120 back to 20. – nobody Jul 06 '11 at 02:50
  • haha; boy do I feel stupid... Suffice it to say the error was on my part ;p – nobody Jul 06 '11 at 02:53
  • "hits a problem"? "stop investigating links"? These words don't make sense. You're hung waiting for socket I/O? You're ISP cuts you off for too many open connections? You've run out of open ports in your OS? It's hard to know what limit you've run afoul of without more concrete descriptions of what you application is doing (or not doing) and what (if anything) it's waiting for. These aren't Python limitations; they could be OS, firewall, router, ISP or other limitations. – S.Lott Jul 06 '11 at 02:53
  • "Suffice it to say the error was on my part". False. That does not suffice. Please **answer** the question if you know what the problem is. Or close it. There's nothing worse than an unanswered question. – S.Lott Jul 06 '11 at 02:54
  • Unregistered accounts cannot answer their own post and cannot vote to close – nobody Jul 06 '11 at 02:57
  • Then either (1) register and answer or (2) ask someone to close. "Suffice it to say the error was on my part" is rude. Other people read these questions and may have the same problem. – S.Lott Jul 06 '11 at 03:01

0 Answers0