6

I have some code that uses mechanize and beautifulsoup for web scraping some data. The code works fine on a test machine but the production machine is blocking the connection. The error i get is:

urlopen error [Errno 10053] An established connection was aborted by the software in your host machine

I have read through similar posts and I cannot find this exact error. The site I am trying to scrape is HTTPS but I have also had the same error occur with an HTTP site. I am using python 2.6 and mechanize 0.2.4.

Is this due to the proxy or, as the error says, something on my local machine?? I've written in for mechanize to use the system's proxy:

br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1')]
br.set_proxies({}) #will use system default proxy
page = br.open(url)
html = page.read()
soup = BeautifulSoup.BeautifulSoup(html)

Again, this all works on my test machine, but the production machine gives that Error 10053.

serk
  • 4,329
  • 2
  • 25
  • 38
  • "The issue here was a host based IDS was preventing the connection out. Problem solved." - Could you please explain in details how the problem is solved, what changes you had need to do in order to solve this problem? I'm facing a similar problem and not sure how can i fix this. Many thanks – daliab Jun 29 '11 at 13:58
  • 1
    I added my python script to the HIDS exception list. The exception list was the list of files that I allowed to connect out to the internet. Once it was added to the list, I was able to get network connectivity with the script and had no further problems. The test machine did not have a HIDS client installed so that is why it was allowing me to talk out. FYI, both had firewalls but only one (production machine) had the HIDS. – serk Jun 29 '11 at 14:09
  • Hi thanks for the answer but I'm sorry for my ignorance - what does HIDS stand for? I don't think I've any such client installed into my system, still, where can I check to be sure I don't have any such similar thing installed? My network security is administered by my companies network security team. Do I need their help to keep my script in alowable access list? – daliab Jun 29 '11 at 14:27
  • HIDS stands for Host based Intrusion Detection System. If the network security team has made the HIDS not visible to you, you might not know where to find it. Also, even if you do find it, you will not be able to disable it. You can ask your security team if they can add an exception for your script. Another sneaky way around the HIDS is to build your script into an exe (using Py2EXE) and rename the executable you create to something already on the HIDS exception list. A good one to rename it to would be your browser, so if Firefox is allowed internet access, rename your exe to firefox.exe. – serk Jun 29 '11 at 17:22
  • This may not work if the HIDS is smart and recognizes that a program is being run from an unknown location. Ex: You rename your program to firefox.exe and run from desktop but the actual path that Firefox should be ran from is C:\Programs\Firefox\. This may raise a few eyebrows as to why you have the program in an unknown path. – serk Jun 29 '11 at 17:24

1 Answers1

3

The issue here was a host based IDS was preventing the connection out. Problem solved.

I added my python script to the HIDS exception list. The exception list was the list of files that I allowed to connect out to the internet. Once it was added to the list, I was able to get network connectivity with the script and had no further problems. The test machine did not have a HIDS client installed so that is why it was allowing me to talk out. FYI, both had firewalls but only one (production machine) had the HIDS.

HIDS stands for Host based Intrusion Detection System. If the network security team has made the HIDS not visible to you, you might not know where to find it. Also, even if you do find it, you will not be able to disable it. You can ask your security team if they can add an exception for your script. Another sneaky way around the HIDS is to build your script into an exe (using Py2EXE) and rename the executable you create to something already on the HIDS exception list. A good one to rename it to would be your browser, so if Firefox is allowed internet access, rename your exe to firefox.exe.

Community
  • 1
  • 1
serk
  • 4,329
  • 2
  • 25
  • 38