0

Need to be able to pass in a URL, and the regex should be able to extract just the query string. The key part though is omitting the hashbang and anything after it.

This is what I have so far, which ignores the hash but still gets text afterwards. It also still gets everything before the first ?.

/([^&=#]+)=?([^&#]*)/g

Note: I know about window.location.search, but I need to be able to pass in any URL string.

Joseph Shambrook
  • 320
  • 6
  • 14

3 Answers3

0

You could not use regex:

var url='http://www.somewhere.com/#something?other&moreStuff';
var index=url.indexOf('#');
var whatIwant = url.substring(index+1);

Or from your regex:

([^#]+)=?([^&#]*)
depperm
  • 10,606
  • 4
  • 43
  • 67
0

If you want to extract the parts of a URL from a string, RegExp is the wrong tool for the job. There are too many oddball cases, and the browser has some simple built-in ways of parsing URIs:

function parseUri(uri) {
    var a = document.createElement('a');
    a.href = uri;
    return {
        protocol: a.protocol,
        host: a.host,
        hostname: a.hostname,
        port: a.port,
        pathname: a.pathname,
        search: a.search,
        hash: a.hash
    };
}

This code won't break when it comes across a URI like:

'http://www.foo.com???#?foo=bar&fizz=buzz#'

And can be used for your case as:

parseUri('http://www.foo.com???#?foo=bar&fizz=buzz#').search; // '???'
zzzzBov
  • 174,988
  • 54
  • 320
  • 367
-1

To get the query string, this one should suffice.

\?.*

If you want to be more specific, you may try this one:

\?(([a-zA-Z]+(=[a-zA-Z])?)&?)+

The first character marks the query string start (?), followed by pairs of key=value accepting also keys without values defines (the (=[a-zA-Z])? is responsible for making that optional). It may be improved, but it is a start point for more complex things. Also, notice that I'm assuming only values composed by lower and upper case letters. You may add numbers to it too.

Reuel Ribeiro
  • 1,419
  • 14
  • 23
  • As you said, 'and the regex should be able to extract just the **query string**'. The query string does not include the # part. This is called fragment. So, accordingly to your requirement, my regex works indeed. – Reuel Ribeiro Jun 26 '15 at 16:18
  • The query string portion of `'http://www.example.com#lorem?ipsum&dolor'` is `''`. – zzzzBov Jun 26 '15 at 16:19
  • @zzzzBov Sorry for ressurrecting 2-yo discussion, but your sample is not valid is what Reuel is saying. The query string is to be considered part of the URL and must therefore come BEFORE the hash, not after. In your example, the question mark is part of the fragment, not part of the URL, and is interpreted thus. The reversed (incorrect) way will also break standard API like window.history.pushstate. – Greg Pettit Aug 17 '17 at 20:42
  • @GregPettit, my point was that the hash may contain `?` characters, and that the provided regex does not account for this fact. "the query string ... must come before the hash" In case I wasn't clear before, my expected output is that the query string be empty if the hash contains something that looks similar to a query string. The provided regex fails at this. "The reversed (incorrect) way will also break standard API like `window.history.pushstate`" I'd be interested if you could show me a [mcve] of this behavior, because that would be a browser bug. – zzzzBov Aug 17 '17 at 21:09
  • Wait, so your comment meant, "it SHOULD be empty string"? Then I agree, and I spoke out of turn. Apologies! For the second part, it's easy. Go to any website without a system of URL rewrites, and reload the page with both a hash and and out-of-order query string. Then in a console enter `window.history.pushState({}, "thing", "#bar")` and watch your incorrectly-placed query string disappear. Set the order to be correct, run the same test, and witness only the hash being updated. – Greg Pettit Aug 18 '17 at 06:50