0

What would be the highest-performing implementation to determine if a string that represents an SQL alias needs to be wrapped in double-quotes?

Presently, in pg-promise I am wrapping every alias in double-quotes, to play it safe. I am looking to make the output SQL neater and shorter, if possible.

And I am divided which approach is the best -

  • to use a regular expression, somehow
  • to do a direct algorithm with strings
  • not to change it at all, if there are reasons for that

Basically, I am looking to improve function as.alias, if possible, not to wrap aliases into double quotes when it is not needed.


What have I tried so far...

I thought at first to do it only for the 99% of all cases - not to add double-quotes when your alias is the most typical one, just a simple word:

function skipQuotes(alias) {
    const m = alias.match(/[A-Z]+|[a-z]+/);
    return m && m[0] === alias;
}

This only checks it is a single word that uses either upper or lower case, but not the combination.


SOLUTION

Following the answer, I ended up with implementation that should cover 99% of all practical use cases, which is what I was trying to achieve:

const m = alias.match(/[a-z_][a-z0-9_$]*|[A-Z_][A-Z0-9_$]*/);
if (m && m[0] === alias) {
    // double quotes will be skipped
} else {
    // double quotes will be added
}

i.e. the surrounding double quotes are not added when the alias uses a simple syntax:

  • it is a same-case single word, without spaces
  • it can contain underscores, and can start with one
  • it can contain digits and $, but cannot start with those
vitaly-t
  • 24,279
  • 15
  • 116
  • 138
  • Is it a keyword (https://www.postgresql.org/docs/9.6/static/sql-keywords-appendix.html)? Does it not begin with a letter or does it contain any characters other than letters (diacritical marks allowed), digits, and underscore? – Gordon Linoff Nov 05 '17 at 15:58
  • would not taking every identifier into double quotes cheaper then checking if identifier needs it?.. – Vao Tsun Nov 05 '17 at 16:00
  • @GordonLinoff yes, and I believe when in double quotes, it can be absolutely any string (1-symbol longer at least). To make it open syntax, it surely cannot contain spaces, or letters in different case, at the very least. – vitaly-t Nov 05 '17 at 16:01
  • @VaoTsun - cheaper - yes, but as I stated in the question, I'm looking to make the SQL leaner, and easier to read, while hopefully, without sacrificing the performance. Mainly because your typical SQL is a very short name that doesn't need extended syntax. – vitaly-t Nov 05 '17 at 16:02
  • There is also a bit strange rule: keywords can be open aliases only when are preceding with the word `as`. So `select 1 as select;` is ok but `select 1 select;` raises an error. – klin Nov 05 '17 at 21:45
  • @klin There are indeed some odd rules out there. But I was after covering the practical 99% of all cases. – vitaly-t Nov 05 '17 at 21:54
  • 1
    I think you can safely add underscores and dollar signs `alias.match(/[a-z_][a-z_$0-9]*|[A-Z_][A-Z_$0-9]*/);` – klin Nov 05 '17 at 21:59
  • @klin Thank you. This is however more of chasing the border cases. My thinking is what people use for aliases when writing SQL, and they usually do not use underscores or dollar signs for aliases :) Though I might add it after all :) – vitaly-t Nov 05 '17 at 22:17
  • Well, I do this sometimes. And this may be one less problem reported by users *why the hell...* ;) – klin Nov 05 '17 at 22:26

1 Answers1

2

Removing double quotes is admirable -- it definitely makes queries easier to read. The rules are pretty simple. A "valid" identifier consists of:

  • Letters (including diacritical marks), numbers, underscore, and dollar sign.
  • Starts with a letter (including diacriticals) or underscore.
  • Is not a reserved word.

(I think I have this summarized correctly. The real rules are in the documentation.)

The first two are readily implemented using regular expressions. The last probably wants a reference table for lookup (and the list varies by Postgres release -- although less than you might imagine).

Otherwise, the identifier needs to be surrounded by escape characters. Postgres uses double quotes (which is ANSI standard).

One reason you may want to do this is because Postgres converts identifiers to lower case for comparison. So, the following works fine:

select xa, Xa, xA, "xa"
from (select 1 as Xa) y

However, this does not work:

select Xa
from (select 1 as "Xa") y

Nor does:

select "Xa"
from (select 1 as Xa) y

In fact, there is no way to get refer to "Xa" without using quotes (at least none that I can readily think of).

Enforcing the discipline of exact matches can be a good thing or a bad thing. I find that one discipline too many: I admit to often ignoring case when writing "casual" code; it is just simpler to type without capitalization (or using double quotes). For more formal code, I try to be consistent.

On the other hand, the rules do allow:

select "Xa", "aX", ax
from (select 1 as "Xa", 2 as "aX", 3 as AX) y

(This returns 1, 2, 3.)

This is a naming convention that I would be happy if it were not allowed.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I have updated my question with an example of my first attempt at it. What do you think about that? I was thinking that if I cover the most useful cases, then it is good enough... after all, I am just clearing up the syntax. – vitaly-t Nov 05 '17 at 16:17
  • @vitaly-t . . . Actually, the moral of my answer is that you need to use quotes for any mixed case identifier -- unless you know it is unique regardless of case in the query. That is sad, but if you are doing something general purpose, you need to support the last example, or explicitly exclude it. – Gordon Linoff Nov 05 '17 at 16:34
  • Thank you! I have added the solution to the question that I ended up with, following this. – vitaly-t Nov 05 '17 at 21:15