6

All of the documentation I can find about the shExpMatch function is awful. For example:

  • findproxyforurl.com - "Will attempt to match hostname or URL to a specified shell expression"
  • Microsoft - "The shExpMatch(str, shexp) function returns true if str matches the shexp using shell expression patterns."
  • Mozilla "Currently, the patterns are shell expressions, not regular expressions."

I've worked with a lot of shells, and I've never seen the phrase "shell expression" used to describe a pattern-matching language before. I have no idea what it's supposed to mean. From the available examples it looks similar to a filename globbing pattern. I wonder why they don't say "glob", or "wildcard", or "filename expansion" (any of those 3 would be more standard, recognizable terms) if that's what they mean. Instead the undefined phrase "shell expression" is universally used by every vendor - but only to describe this function. If I didn't know better I'd think they were all just copying each other's documentation without reading it.

If we accept that "shell expression" means glob then the trouble is just beginning. Which shell? Do the different implementations even agree? I can guess that this function was originated by some unix programmer, whose default idea of globs was Bourne-shell-ish. But there are a lot of variants in that -ish suffix! Basic features are * and ? and []. Does the [] support character classes like [[:alnum:]] or just individual characters and ranges? Does it support negation like [!a-z] or maybe [^a-z]? Can all of the special characters be matched literally by preceding them with a backslash (including backslash itself)? Are there any other shell-like quoting operators? Does the * operator really act like a glob, matching a single level of a directory hierarchy, so * and */* are mutually exclusive, or does it match slashes too? Does it fail to match a leading dot? Are any of the extensions from ksh, bash, and zsh present? Maybe even csh-like brace expansion (which is not a glob operation but is often mistaken for one)?

On the other hand, maybe it was designed by a Microsoft-oriented person to support Windows users, so I should think more like COMMAND.COM wildcards. Would Microsoft use a foreign pattern matching language, and not explicitly document it as such?

Is there an authoritative source that I've overlooked which actually specifies the matching rules? Failing that, has anyone studied the current implementations in enough detail to determine what the rules actually are?

  • I have actually a very similar question: **Who** defined these methods? I try to execute this in C#-code with a JS engine, and the engine doesn't know these methods. So it's not something _well known_ in JS. – ecth Jun 01 '16 at 09:14
  • @ecth Even in the programs that do implement `shExpMatch` (Chrome, Firefox, MSIE) you can't call it in the normal web page context or even in the debugger. Af far as I can tell, it's a function that is completely invisible everywhere except inside a `proxy.pac`, which makes it really annoying to have to determine its behavior by experiment. As for who defined the method originally, I haven't done the research but my guess is Netscape. This function might be as old as Javascript itself. –  Jun 01 '16 at 23:26

1 Answers1

6

This is the implementation of shExpMatch() that Mozilla uses:

function shExpMatch(url, pattern) {
   pattern = pattern.replace(/\./g, '\\.');
   pattern = pattern.replace(/\*/g, '.*');
   pattern = pattern.replace(/\?/g, '.');
   var newRe = new RegExp('^'+pattern+'$');
   return newRe.test(url);
}

(Source)

Like you said the original documentation came from Netscape was here and now moved to here. (Source).

Afaik some implementations of these proxy.pac don't event support fullblown JScript or JavaScript but only ECMAScript. I have also seen proxy.pac files that don't use these methods but use string operations instead:

function FindProxyForURL(url,host)
{
    if (host.substring(0, 4) == "192." ||
       host.substring(0, 7) == "example" )
    {
        return "DIRECT";
    }
}

So it really seems like there is no default standard. For Windows I now use WinApi calls like WinHttpGetProxyForUrl. At least my implementation won't differ from what my OS does.

Hope that helped ;)

Nils Ballmann
  • 645
  • 10
  • 11
ecth
  • 1,215
  • 1
  • 16
  • 33