0

I'm trying to check a few URLs to see if they come back as OK before I further manipulate them, I have a list of URLs in self.myList, which then runs these through the httplib HTTP Connection to get the response, however I get a load of errors from the httplib in cmd.

the code works, as I've tested with the below and it correctly comes back and sets the value in a wx.TextCtrl:

#for line in self.myList:
            conn = httplib.HTTPConnection("www.google.com")
            conn.request("HEAD", "/")
            r1 = conn.getresponse()
            r1 = r1.status, r1.reason
            self.urlFld.SetValue(str(r1))

It just doesn't seem to work when I pass it more than 1 URL from myList.

for line in self.myList:
            conn = httplib.HTTPConnection(line)
            conn.request("HEAD", "/")
            r1 = conn.getresponse()
            r1 = r1.status, r1.reason
            self.urlFld.SetValue(line + "\t\t" + str(r1))

The errors I get on cmd are

Traceback (most recent call last):
File "gui_texteditor_men.py", line 96, in checkBtnClick
conn.request("HEAD", "/")
File "C:\Python27\lib\httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 992, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 954, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 814, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 776, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 757, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno 11004] getaddrinfo failed

Edit, updated code using urlparse. I have imported urlparse.

for line in self.myList:
            url = urlparse.urlparse(line)
            conn = httplib.HTTPConnection(url.hostname)
            conn.request("HEAD", url.path)
            r1 = conn.getresponse()
            r1 = r1.status, r1.reason
            self.urlFld.AppendText(url.hostname + "\t\t" + str(r1))

with traceback,

C:\Python27\Coding>python gui_texteditor_men.py
Traceback (most recent call last):
File "gui_texteditor_men.py", line 97, in checkBtnClick
conn = httplib.HTTPConnection(url.hostname)
File "C:\Python27\lib\httplib.py", line 693, in __init__
self._set_hostport(host, port)
File "C:\Python27\lib\httplib.py", line 712, in _set_hostport
i = host.rfind(':')
AttributeError: 'NoneType' object has no attribute 'rfind'

I now have www.google.com and www.bing.com in a .txt file, when it throws this error.

Edit 2 @ Aya,

looks like it failed due to the "\n" between the 2 URLs. I thought I coded it to remove the "\n" with .strip() but seems it didnt have any effect.

Failed on u'http://www.google.com\nhttp://www.bing.com'
Traceback (most recent call last):
File "gui_texteditor_men.py", line 99, in checkBtnClick
conn.request("HEAD", url.path)
File "C:\Python27\lib\httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 992, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 954, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 814, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 776, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 757, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno 11004] getaddrinfo failed

I took another look at my .strip() when I open the file,

if dlg.ShowModal() == wx.ID_OK:
        directory, filename = dlg.GetDirectory(), dlg.GetFilename()
        self.filePath = '/'.join((directory, filename))
        self.fileTxt.SetValue(self.filePath)
        self.urlFld.LoadFile(self.filePath)
        self.myList = self.urlFld.GetValue().strip()

and now it traceback errors with "Failed on u'h'"

Thanks

jerrythebum
  • 330
  • 1
  • 6
  • 17
  • Which errors do you geT? – ThiefMaster Apr 16 '13 at 16:10
  • Sounds like one of the URLs contains an invalid hostname. – Aya Apr 16 '13 at 16:12
  • it does, its like www.blahlsghsh.com but surely that should go and attempt it then come back as 404 not found? EDIT: jsut took out the fake hostname and tried again, same errors with just google.com and bing.com so its not that affecting it – jerrythebum Apr 16 '13 at 16:13
  • You'll only get a 404 if the hostname is valid, but the path isn't. If the hostname doesn't exist, you'll get a DNS lookup fail. – Aya Apr 16 '13 at 16:14
  • changed it to a valid hostname and "myfakepage.html" so should return 404, still get all errors. – jerrythebum Apr 16 '13 at 16:16
  • Are you sure it's the same error this time? I'm almost certain it won't be `getaddrinfo failed`, but something else. Edit: never mind - I know what the problem is. – Aya Apr 16 '13 at 16:17

1 Answers1

1

If self.myList contains a list of URLs, you can't use them directly in the HTTPConnection constructor like you do here...

for line in self.myList:
    conn = httplib.HTTPConnection(line)
    conn.request("HEAD", "/")

The HTTPConnection constructor should only be passed the hostname part of the URL, and the request method should be given the path part. You'll need to parse the URL with something like...

import urlparse

for line in self.myList:
    url = urlparse.urlparse(line)
    conn = httplib.HTTPConnection(url.hostname)
    conn.request("HEAD", url.path)

Update

Can you change the code to...

for line in self.myList:
    try:
        url = urlparse.urlparse(line)
        conn = httplib.HTTPConnection(url.hostname)
        conn.request("HEAD", url.path)
        r1 = conn.getresponse()
        r1 = r1.status, r1.reason
        self.urlFld.AppendText(url.hostname + "\t\t" + str(r1))
    except:
        print 'Failed on %r' % line
        raise

...and include the full output of running it?

Update #2

I'm not quite sure what self.fileTxt and self.urlFld are supposed to do, but if you're just reading lines from self.filePath, you only need...

if dlg.ShowModal() == wx.ID_OK:
    directory, filename = dlg.GetDirectory(), dlg.GetFilename()
    self.filePath = '/'.join((directory, filename))
    self.myList = [line.strip() for line in open(self.filePath, 'r').readlines()]
Aya
  • 39,884
  • 6
  • 55
  • 55
  • ok thanks. didn't know i couldnt pass urls to httplib. i tried as you suggested, `for line in self.myList: url = urlparse.urlparse(line) conn = httplib.HTTPConnection(url.hostname) conn.request("HEAD", url.path) r1 = conn.getresponse() r1 = r1.status, r1.reason self.urlFld.AppendText(url.hostname + "\t\t" + str(r1))` and get a cmd error `Traceback (most recent call last): AttributeError: 'NoneType' object has no attribute 'rfind'` – jerrythebum Apr 16 '13 at 16:43
  • @directpixel can you update the original question and append the new code and the new traceback in full, please? – Aya Apr 16 '13 at 16:48
  • @directpixel which URL is it failing on this time? – Aya Apr 16 '13 at 16:54
  • fails on www.google.com and www.bing.com in a .txt file which makes up myList – jerrythebum Apr 16 '13 at 16:58
  • @directpixel Well, those aren't valid URLs - change them to `http://www.google.com` and `http://www.bing.com` respectively. – Aya Apr 16 '13 at 16:59
  • i tried that before i wrote you a comment as i thought you may mention that :). the traceback error i get then is the same as it was at the very beginning. `for res in getaddrinfo(host, port, 0, SOCK_STREAM): socket.gaierror: [Errno 11004] getaddrinfo failed` etc – jerrythebum Apr 16 '13 at 17:01
  • @directpixel The bug will be in the code which creates `self.myList`, so you'll have to include that. – Aya Apr 16 '13 at 17:30
  • updated the original post to include the creation of self.myList – jerrythebum Apr 16 '13 at 17:34
  • those are other bits, should have deleted them to avoid confusion. and that works brilliantly now, thanks a lot for the consistent help; really helped a lot – jerrythebum Apr 16 '13 at 17:48