1

I was looking to generate some special storage tables in my application based on a normalized string (think json string used as an id). At first I was using this item as an id, but the table has gotten so large that even reasonable queries are taking unreasonable amounts of time.

I was going to use this string as part of the database name, so I was looking at how to do that. It looks like I can, but the overwhelmingly common response is to "not do it" and "I'll regret it later."

I am fine changing up this id so I avoid special characters, but I'd really like to know why this is so taboo. I have found plenty of examples saying not to, but no one actually to explain why. The closest I have found to an answer is that "it will cause pain in the development cycle"

Particularly in situations where the database is pretty much entirely managed in code, is there a good reason?

I know this runs the risk of being opinion based, so I'd really like some quantifiable reasons.

Community
  • 1
  • 1
S. Buda
  • 727
  • 7
  • 27

1 Answers1

1

There are two basic aspects to the problem. The first is differing conventions on alphabets and encodings between the many programs we use. The second is the difference between a sequence of characters and the sequence of characters used to represent it.

We run input and output between many programs in software development. These treat sequences of characters (some representing strings, including symbol names or data values) in various different ways (in input, processing & output). Many of these are not guaranteed to work except in very limited cases. So if you want to minimize problems you should use the simplest possible character sequences in terms of alphabets and encodings, including for those representing names and strings.

Whenever a sequence of characters that represents a string differs from the string we must convert between the two kinds of sequences. The simplest case is when the sequences are the same. The next simplest case is when we can just put some special character around a string that can't contain it. But otherwise, transforming requires calling a program. Eg in SQL we can input a non-quoted table name as its characters. For a sequence that is a value of an SQL VARCHAR type that contains only alpha-numerics and underscore we can just put quotes around it. But for such a VARCHAR that can also contain a quote we have to also duplicate any quotes inside it. For a VARCHAR with arbitrary characters we have to introduce escape characters, including to escape escape characters. As I said, the SQL non-quoted table name convention avoids these transformation problems.

philipxy
  • 14,867
  • 6
  • 39
  • 83