I'm making an app that parses html and gets images from it. Parsing is easy using Beautiful Soup and downloading of the html and the images works too with urllib2.
I do have a problem with urlparse to make absolute paths out of relative ones. The problem is best explained with an example:
>>> import urlparse
>>> urlparse.urljoin("http://www.example.com/", "../test.png")
'http://www.example.com/../test.png'
As you can see, urlparse doesn't take away the ../ away. This gives a problem when I try to download the image:
HTTPError: HTTP Error 400: Bad Request
Is there a way to fix this problem in urllib?