6

After loging in on a website I want to collect its links. This I do with this function (using mechanize and urlparse libraries):

br = mechanize.Browser()

.
. #logging in on website
.

for link in br.links():
    url = urlparse.urljoin(link.base_url, link.url)

    hostname = urlparse.urlparse(url).hostname
    path = urlparse.urlparse(url).path

    #print hostname #by printing this I found it to be the source of the None value

    mylinks.append("http://" + hostname + path)

and I get this error message:

    mylinks.append("http://" + hostname + path)
TypeError: cannot concatenate 'str' and 'NoneType' objects

I am not sure on how to fix this, or even if it can be fixed at all. Is there any way to force the function to append even if it would produce a nonworking and weird result for the None value?

Alternatively, what I'm really after in the link is what the link ends with. for example, the html code for one of the links look like this (what I am after is the world "lexik"):

<td class="center">
    <a href="http://UnimportantPartOfLink/lexik>>lexik</a>
</td>

so an alternative route would be if mechanize can just collect this value directly, bypassing the links and None value troubles

Gabe
  • 84,912
  • 12
  • 139
  • 238
user3053161
  • 291
  • 3
  • 8

2 Answers2

6

Another good way without any try and except block -

Replace hostname = urlparse.urlparse(url).hostname with

hostname = urlparse.urlparse(url).hostname or ''

and similarly path = urlparse.urlparse(url).path with

path = urlparse.urlparse(url).path or ''

Hope this helps !

Arovit
  • 3,579
  • 5
  • 20
  • 24
  • thanks for the suggestion, that still gave the "TypeError: cannot concatenate 'str' and 'NoneType' objects" message thou – user3053161 Dec 01 '13 at 18:54
  • that works, neat solution. How does it work, is it like if the first value is None it gives the empty string instead? – user3053161 Dec 03 '13 at 19:35
  • Yes. If the value of the first value is None, it will be the second one that is assigned. – aIKid Dec 03 '13 at 22:26
4

Why not use a try/except block?

try:
    mylinks.append("http://" + hostname + path)
except TypeError:
    continue

If there's an error, it would just skip the appending and go on with the loop.

Hope this helps!

aIKid
  • 26,968
  • 4
  • 39
  • 65