9

I'd like to add the 'http' scheme name in front of a given url string if it's missing. Otherwise, leave the url alone so I thought urlparse was the right way to do this. But whenever there's no scheme and I use get url, I get /// instead of '//' between the scheme and domain.

>>> t = urlparse.urlparse('www.example.com', 'http')
>>> t.geturl()
'http:///www.example.com' # three ///

How do I convert this url so it actually looks like:

'http://www.example.com' # two //
Dan Holman
  • 825
  • 1
  • 10
  • 20

2 Answers2

6

Short answer (but it's a bit tautological):

>>> urlparse.urlparse("http://www.example.com").geturl()
'http://www.example.com'

In your example code, the hostname is parsed as a path not a network location:

>>> urlparse.urlparse("www.example.com/go")
ParseResult(scheme='', netloc='', path='www.example.com/go', params='', \
    query='', fragment='')

>>> urlparse.urlparse("http://www.example.com/go")
ParseResult(scheme='http', netloc='www.example.com', path='/go', params='', \
    query='', fragment='')
miku
  • 181,842
  • 47
  • 306
  • 310
  • 3
    I see. I was under the impression that url parse would smartly determine the lack of a scheme and reconstruct it better. Fixed it by simply checking if the url string starts with 'http://' and appending it accordingly. – Dan Holman Sep 02 '11 at 21:58
  • @Dan Holman I expected that too, but if you think about it, you can't really expect that. Because "images/tick.png" refers to a relative path, not a full URL. How can urlparse distinguish between that and "www.example.com"? Just because it *looks* like a domain name doesn't mean it's not a valid path. – mgiuca Sep 03 '11 at 04:06
2

If you want to use urlparse as you were intending to, the closest "correct" equivalent is to use "//www.example.com" as the urlstring. Such a urlstring is unambiguously an absolute path without a scheme, so you could then supply "http" as the default scheme. I suppose you could do this by detecting whether your URL includes the string "//" and if not, prepending "//" on the front.

mgiuca
  • 20,958
  • 7
  • 54
  • 70