Questions tagged [urlparse]

urlparse is used for parsing a URL into components like (addressing scheme, network location, path etc.)

urlparse is module in Python2.7 and renamed to urllib.parse in Python 3

Links:

urlparse

urllib.parse

196 questions
3
votes
1 answer

Fuzzy URL matching in Python

I'd like to find a tool that does a good job of fuzzy matching URLs that are the same expecting extra parameters. For instance, for my use case, these two URLs are the same: atest = (http://www.npr.org/templates/story/story.php?storyId=4231170',…
Chris J. Vargo
  • 2,266
  • 7
  • 28
  • 43
3
votes
1 answer

urlparse() query string missing

I have two systems: First one work as intended: >>> urlparse.urlparse('foo://bar/?blu=1') ParseResult(scheme='foo', netloc='bar', path='/', params='', query='blu=1', fragment='') # sys.version_info(major=2, minor=7, micro=12, releaselevel='final',…
guettli
  • 25,042
  • 81
  • 346
  • 663
3
votes
1 answer

parsing a url in python with changing part in it

I'm parsing a url in Python, below you can find a sample url and the code, what i want to do is splitting the (74743) from the url and make a for loop which will be taking it from a parts list. Tried to use urlparse but couldn't complete it to the…
T.M
  • 93
  • 9
3
votes
2 answers

urljoin when an absolute path does not have a leading slash

Some websites like http://www.gilacountyaz.gov/government/assessor/index.php have a bunch of internal links that should be absolute paths, but do not have the leading slash. When parsing them with urlparse.urljoin the result is the following: >>>…
Mikk
  • 804
  • 8
  • 23
3
votes
1 answer

Why is urlparse.urlenparse works inconsistent?

When netloc is empty urlparse.urlunparse is inconsistent: >>> urlparse.urlunparse(('http','','test_path', None, None, None)) 'http:///test_path' >>> urlparse.urlunparse(('ftp','','test_path', None, None, None)) 'ftp:///test_path' >>>…
running.t
  • 5,329
  • 3
  • 32
  • 50
2
votes
2 answers

urlparse doesn't return params for custom schema

I am trying to use urlparse Python library to parse some custom URIs. I noticed that for some well-known schemes params are parsed correctly: >>> from urllib.parse import urlparse >>>…
Konrad Sikorski
  • 399
  • 5
  • 11
2
votes
5 answers

Splitting a url into a list in python

I am currently working on a project that involves splitting a url. I have used the urlparse module to break up the url, so now I am working with just the path segment. The problem is that when I try to split() the string based on the delimiter "/"…
chindes
  • 61
  • 2
  • 10
2
votes
1 answer

Django url path converter not working in production

I'm using path converter in my django app like so: # urls.py from . import views from django.urls import path urlpatterns = [ path('articles/', views.ArticleView), ] # views.py @login_required def ArticleView(request,…
Charles
  • 555
  • 4
  • 16
2
votes
1 answer

Python - getting image name and extension from url what does not end with file filename extension

Basically, my goal is to fetch the filename, extension and the content of an image by its url. And my fuction should work for both of these urls: easy…
Edgar Navasardyan
  • 4,261
  • 8
  • 58
  • 121
2
votes
2 answers

Python - Parsing a string for URLs and extracting them

I know that with urllib you can parse a string and check if it's a valid URL. But how would one go about checking if a sentence contains a URL within it, and then extract that URL. I've seen some huge regular expressions out there, but i would…
Cooper
  • 21
  • 1
  • 2
2
votes
4 answers

Python urlparse: small issue

I'm making an app that parses html and gets images from it. Parsing is easy using Beautiful Soup and downloading of the html and the images works too with urllib2. I do have a problem with urlparse to make absolute paths out of relative ones. The…
Mew
  • 1,049
  • 7
  • 17
2
votes
3 answers

parsing an url for crawler

i am writting an small crawler that extract some 5 to 10 sites while getting the links i am getting some urls like this ../tets/index.html if it is /test/index.html we can add with base url http://www.example.com/test/index.html what can i do for…
raj
  • 63
  • 1
  • 4
2
votes
2 answers

How do I get urljoin to work as expected in Python?

Let's say I have the following URLs: url = https://www.example.com/thing1/thing2/thing3 next_thing = thing4 and I want the following URL: https://www.example.com/thing1/thing2/thing3/thing4 When I try >>> urlparse.urljoin(url,next_thing) I get…
codycrossley
  • 571
  • 1
  • 6
  • 17
2
votes
1 answer

Python 3 : Why would you use urlparse/urlsplit

I'm not exactly sure what these modules are used for. I get that they split the respective url into its components, but why would that be useful, or what is an example of when to use urlparse?
Aran Freel
  • 3,085
  • 5
  • 29
  • 42
2
votes
0 answers

What is the params part of the tuple returned from Python urlparse?

I am doing some validation on URLs and I can not find a good example to express the params part of the returned tuple from urlparse(). From https://docs.python.org/2/library/urlparse.html : >>> from urlparse import urlparse >>> o =…
Marc
  • 1,895
  • 18
  • 25