Questions tagged [url-parsing]

URL parsing is the process of taking a URL and producing a representation of its constituent parts: scheme, host, port, path, query, and fragment. The URL Standard defines an algorithm for URL parsing.

See https://url.spec.whatwg.org/#url-parsing

166 questions
14
votes
4 answers

How can I check whether a URL is valid using `urlparse`?

I want to check whether a URL is valid, before I open it to read data. I was using the function urlparse from the urlparse package: if not bool(urlparse.urlparse(url).netloc): # do something like: open and read using urllin2 However, I noticed…
Ziva
  • 3,181
  • 15
  • 48
  • 80
13
votes
4 answers

Break a URL into its components

I'm using javascript and would like to take a URL string that I have and break it down into its components such as the host, path, and query arguments. I need to do this in order to get to one of the query arguments, which is itself a URL and is…
Chris Dutrow
  • 48,402
  • 65
  • 188
  • 258
10
votes
7 answers

Get second level domain name from URL

Is there a way to get top level domain name from the url for e.g., "https://images.google.com/blah" => "google" I found this: var domain = new URL(pageUrl).hostname; but it gives me "images.google.com" instead of just google. Unit tests I have…
sublime
  • 4,013
  • 9
  • 53
  • 92
9
votes
4 answers

PHP - remove http/www from message (except for the host domain) to disable clickable links

I have a simple message board, let's say: mywebsite.com, that allows users to post their messages. Currently the board makes all links clickable, ie. when someone posts something that starts with: http://, https://, www., http://www.,…
NonCoder
  • 235
  • 4
  • 10
9
votes
3 answers

How can multiple trailing slashes can be removed from a URL in Ruby

What I'm trying to achieve here is lets say we have two example URLs: url1 = "http://emy.dod.com/kaskaa/dkaiad/amaa//////////" url2 = "http://www.example.com/" How can I extract the striped down URLs? url1 =…
splintercell
  • 575
  • 1
  • 7
  • 22
9
votes
4 answers

How to identify the top level domain of a URL object using java?

Given this : URL u=new URL("someURL"); How do i identify the top level domain of the URL..
trinity
  • 10,394
  • 15
  • 49
  • 67
8
votes
2 answers

Using urltools::url_parse with UTF-8 domains

The function url_parse is very fast and works fine most of the time. But recently, domain names may contain UTF-8 characters, for example url <- "www.cordes-tiefkühlprodukte.de" Now if I apply url_parse on this url, I get a special character "< fc…
Karsten W.
  • 17,826
  • 11
  • 69
  • 103
7
votes
3 answers

Redact and remove password from URL

I have an URL like this: https://user:password@example.com/path?key=value#hash The result should be: https://user:???@example.com/path?key=value#hash I could use a regex, but instead I would like to parse the URL a high level data structure, then…
guettli
  • 25,042
  • 81
  • 346
  • 663
6
votes
1 answer

First argument for Url.Parser.custom in Elm

The docs for Url.Parser.custom give an example: int : Parser (Int -> a) a int = custom "NUMBER" String.toInt But don't indicate what "NUMBER" is used for. I checked the source and it seems to be capture as tipe, but never used: custom : String…
davetapley
  • 17,000
  • 12
  • 60
  • 86
6
votes
2 answers

Parse a git URL like 'ssh://git@gitlab.org.net:3333/org/repo.git'?

How could I easily extract hostname from a git URL like ssh://git@gitlab.org.net:3333/org/repo.git u = urlparse(s) gives me ParseResult(scheme='ssh', netloc='git@gitlab.org.net:3333', path='/org/repo.git', params='', query='', fragment='') which…
d33tah
  • 10,999
  • 13
  • 68
  • 158
5
votes
2 answers

window.location.hash issue in Firefox

Consider the following code: hashString = window.location.hash.substring(1); alert('Hash String = '+hashString); When run with the following hash: #car=Town%20%26%20Country the result in Chrome and Safari will be: car=Town%20%26%20Country but…
Yarin
  • 173,523
  • 149
  • 402
  • 512
5
votes
0 answers

Why is it that "using anything but a utf-8 decoder...might be insecure" in a URL percent decoding algorithm?

I am implementing a URL parser and have a question about the W3C URL spec (at http://www.w3.org/TR/2014/WD-url-1-20141209/ ) In section "2. Percent-encoded bytes" it has the following algorithm (emphasis added): To percent decode a byte sequence…
Chad
  • 1,750
  • 2
  • 16
  • 32
5
votes
1 answer

Java URL Class getPath(), getQuery() and getFile() inconsistent with RFC3986 URI Syntax

I am writing a utility class that semi-wraps Java's URL class, and I have written a bunch of test cases to verify the methods I have wrapped with a customized implementation. I don't understand the output of some of Java's getters for certain URL…
Selena
  • 2,208
  • 8
  • 29
  • 49
4
votes
1 answer

Regex ignore URL already in HTML tags

I'm having a little problem with my Regex I've made a custom BBcode for my website, however I also want URLs to be parsed too. I'm using preg_replace and this is the pattern used to identify URLS: /([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/is Which…
Moe
  • 4,744
  • 7
  • 28
  • 37
4
votes
2 answers

split path name to get routing parameter

i am using mvc and jquery in my application i have the routing url like this : ID/Controller/Action I want to get the URL and split it to get the id in jquery
Eslam Soliman
  • 1,276
  • 5
  • 16
  • 42
1
2
3
11 12