Questions tagged [urlparse]

urlparse is used for parsing a URL into components like (addressing scheme, network location, path etc.)

urlparse is module in Python2.7 and renamed to urllib.parse in Python 3

Links:

urlparse

urllib.parse

196 questions
15
votes
1 answer

Python urlparse.parse_qs unicode url

urlparse.parse_qs is usefull for parsing url parameters, and it works fine with simple ASCII url, represented by str. So i can parse a query and then construct the same path using urllib.urlencode from parsed data: >>> import urlparse >>> import…
stalk
  • 11,934
  • 4
  • 36
  • 58
14
votes
4 answers

How can I check whether a URL is valid using `urlparse`?

I want to check whether a URL is valid, before I open it to read data. I was using the function urlparse from the urlparse package: if not bool(urlparse.urlparse(url).netloc): # do something like: open and read using urllin2 However, I noticed…
Ziva
  • 3,181
  • 15
  • 48
  • 80
14
votes
6 answers

Find http:// and or www. and strip from domain. leaving domain.com

I'm quite new to python. I'm trying to parse a file of URLs to leave only the domain name. some of the urls in my log file begin with http:// and some begin with www.Some begin with both. This is the part of my code which strips the http:// part.…
Paul Tricklebank
  • 229
  • 2
  • 3
  • 11
14
votes
2 answers

Python: How to check if a string is a valid IRI?

Is there a standard function to check an IRI, to check an URL apparently I can use: parts = urlparse.urlsplit(url) if not parts.scheme or not parts.netloc: '''apparently not an url''' I tried the above with an URL containing Unicode…
Eduard Florinescu
  • 16,747
  • 28
  • 113
  • 179
13
votes
3 answers

How to construct relative url, given two absolute urls in Python

Is there a builtin function to get url like this: ../images.html given a base url like this: http://www.example.com/faq/index.html and a target url such as http://www.example.com/images.html I checked urlparse module. What I want is counterpart of…
yasar
  • 13,158
  • 28
  • 95
  • 160
13
votes
2 answers

change urlparse.path of a url

Here is the python code: url = http://www.phonebook.com.pk/dynamic/search.aspx path = urlparse(url) print (path) >>>ParseResult(scheme='http', netloc='www.phonebook.com.pk', path='/dynamic/search.aspx', params='',…
Mansoor Akram
  • 1,997
  • 4
  • 24
  • 40
13
votes
4 answers

how to parse mysql database name from database_url

DATABASE_URL- MYSQL://username:password@host:port/database_name Error: database_name has no attributes. if 'DATABASE_URL' in os.environ: url = urlparse(os.getenv['DATABASE_URL']) g['db'] =…
guri
  • 1,521
  • 2
  • 14
  • 20
13
votes
6 answers

Parse custom URIs with urlparse (Python)

My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses http. If I do not adjust urlparse's uses_* lists I get this: >>>…
u0b34a0f6ae
  • 48,117
  • 14
  • 92
  • 101
11
votes
2 answers

Modify URL components in Python 2

Is there a cleaner way to modify some parts of a URL in Python 2? For example http://foo/bar -> http://foo/yah At present, I'm doing this: import urlparse url = 'http://foo/bar' # Modify path component of URL from 'bar' to 'yah' # Use nasty…
Gareth Stockwell
  • 3,112
  • 18
  • 23
9
votes
2 answers

urlparse.urlparse returning 3 '/' instead of 2 after scheme

I'd like to add the 'http' scheme name in front of a given url string if it's missing. Otherwise, leave the url alone so I thought urlparse was the right way to do this. But whenever there's no scheme and I use get url, I get /// instead of '//'…
Dan Holman
  • 825
  • 1
  • 10
  • 20
8
votes
9 answers

URL parsing in Python - normalizing double-slash in paths

I am working on an app which needs to parse URLs (mostly HTTP URLs) in HTML pages - I have no control over the input and some of it is, as expected, a bit messy. One problem I'm encountering frequently is that urlparse is very strict (and possibly…
shevron
  • 3,463
  • 2
  • 23
  • 35
7
votes
4 answers

urlparse fails with simple url

this simple code makes urlparse get crazy and it does not get the hostname properly but sets it up to None: from urllib.parse import urlparse parsed = urlparse("google.com/foo?bar=8") print(parsed.hostname) Am I missing something?
user1618465
  • 1,813
  • 2
  • 32
  • 58
7
votes
2 answers

Combining a url with urlunparse

I'm writing something to 'clean' a URL. In this case all I'm trying to do is return a faked scheme as urlopen won't work without one. However, if I test this with www.python.org It'll return http:///www.python.org. Does anyone know why the extra /,…
Ben
  • 51,770
  • 36
  • 127
  • 149
7
votes
0 answers

Repair URL using Python

There is a big file. Each line of this file is URL typed in by human so there can be different problems like missing http missing www etc. Is there a Python module which can repair those urls? I've tried url_fix from werkzeug.urls but it is not…
Milano
  • 18,048
  • 37
  • 153
  • 353
7
votes
1 answer

How can I add http:// to a URL if no protocol is defined in JavaScript?

My question is the same as this one, but the correct answers are for PHP, not JavaScript. How to add http:// if it doesn't exist in the URL How can I add http:// to the URL if there isn't a http:// or https:// or…
Nearpoint
  • 7,202
  • 13
  • 46
  • 74
1
2
3
13 14