URL parsing is the process of taking a URL and producing a representation of its constituent parts: scheme, host, port, path, query, and fragment. The URL Standard defines an algorithm for URL parsing.
Questions tagged [url-parsing]
166 questions
14
votes
4 answers
How can I check whether a URL is valid using `urlparse`?
I want to check whether a URL is valid, before I open it to read data.
I was using the function urlparse from the urlparse package:
if not bool(urlparse.urlparse(url).netloc):
# do something like: open and read using urllin2
However, I noticed…

Ziva
- 3,181
- 15
- 48
- 80
13
votes
4 answers
Break a URL into its components
I'm using javascript and would like to take a URL string that I have and break it down into its components such as the host, path, and query arguments.
I need to do this in order to get to one of the query arguments, which is itself a URL and is…

Chris Dutrow
- 48,402
- 65
- 188
- 258
10
votes
7 answers
Get second level domain name from URL
Is there a way to get top level domain name from the url
for e.g., "https://images.google.com/blah" => "google"
I found this:
var domain = new URL(pageUrl).hostname;
but it gives me "images.google.com" instead of just google.
Unit tests I have…

sublime
- 4,013
- 9
- 53
- 92
9
votes
4 answers
PHP - remove http/www from message (except for the host domain) to disable clickable links
I have a simple message board, let's say: mywebsite.com, that allows users to post their messages. Currently the board makes all links clickable, ie. when someone posts something that starts with:
http://, https://, www., http://www.,…

NonCoder
- 235
- 4
- 10
9
votes
3 answers
How can multiple trailing slashes can be removed from a URL in Ruby
What I'm trying to achieve here is lets say we have two example URLs:
url1 = "http://emy.dod.com/kaskaa/dkaiad/amaa//////////"
url2 = "http://www.example.com/"
How can I extract the striped down URLs?
url1 =…

splintercell
- 575
- 1
- 7
- 22
9
votes
4 answers
How to identify the top level domain of a URL object using java?
Given this :
URL u=new URL("someURL");
How do i identify the top level domain of the URL..

trinity
- 10,394
- 15
- 49
- 67
8
votes
2 answers
Using urltools::url_parse with UTF-8 domains
The function url_parse is very fast and works fine most of the time. But recently, domain names may contain UTF-8 characters, for example
url <- "www.cordes-tiefkühlprodukte.de"
Now if I apply url_parse on this url, I get a special character "< fc…

Karsten W.
- 17,826
- 11
- 69
- 103
7
votes
3 answers
Redact and remove password from URL
I have an URL like this:
https://user:password@example.com/path?key=value#hash
The result should be:
https://user:???@example.com/path?key=value#hash
I could use a regex, but instead I would like to parse the URL a high level data structure, then…

guettli
- 25,042
- 81
- 346
- 663
6
votes
1 answer
First argument for Url.Parser.custom in Elm
The docs for Url.Parser.custom give an example:
int : Parser (Int -> a) a
int =
custom "NUMBER" String.toInt
But don't indicate what "NUMBER" is used for.
I checked the source and it seems to be capture as tipe, but never used:
custom : String…

davetapley
- 17,000
- 12
- 60
- 86
6
votes
2 answers
Parse a git URL like 'ssh://git@gitlab.org.net:3333/org/repo.git'?
How could I easily extract hostname from a git URL like ssh://git@gitlab.org.net:3333/org/repo.git
u = urlparse(s)
gives me
ParseResult(scheme='ssh', netloc='git@gitlab.org.net:3333', path='/org/repo.git', params='', query='', fragment='')
which…

d33tah
- 10,999
- 13
- 68
- 158
5
votes
2 answers
window.location.hash issue in Firefox
Consider the following code:
hashString = window.location.hash.substring(1);
alert('Hash String = '+hashString);
When run with the following hash:
#car=Town%20%26%20Country
the result in Chrome and Safari will be:
car=Town%20%26%20Country
but…

Yarin
- 173,523
- 149
- 402
- 512
5
votes
0 answers
Why is it that "using anything but a utf-8 decoder...might be insecure" in a URL percent decoding algorithm?
I am implementing a URL parser and have a question about the W3C URL spec (at http://www.w3.org/TR/2014/WD-url-1-20141209/ ) In section "2. Percent-encoded bytes" it has the following algorithm (emphasis added):
To percent decode a byte sequence…

Chad
- 1,750
- 2
- 16
- 32
5
votes
1 answer
Java URL Class getPath(), getQuery() and getFile() inconsistent with RFC3986 URI Syntax
I am writing a utility class that semi-wraps Java's URL class, and I have written a bunch of test cases to verify the methods I have wrapped with a customized implementation. I don't understand the output of some of Java's getters for certain URL…

Selena
- 2,208
- 8
- 29
- 49
4
votes
1 answer
Regex ignore URL already in HTML tags
I'm having a little problem with my Regex
I've made a custom BBcode for my website, however I also want URLs to be parsed too.
I'm using preg_replace and this is the pattern used to identify URLS:
/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/is
Which…

Moe
- 4,744
- 7
- 28
- 37
4
votes
2 answers
split path name to get routing parameter
i am using mvc and jquery in my application i have the routing url like this :
ID/Controller/Action
I want to get the URL and split it to get the id in jquery

Eslam Soliman
- 1,276
- 5
- 16
- 42