My question relates to databases (and in particular SQL Server): in the official guide, it is mentioned that when using "NVARCHAR/NCHAR", "2 bytes of storage per character" is used and "if a surrogate pair is needed, a character will require 4 bytes of storage." How is 4-byte used when surrogate pair is needed? how is that "need" going to be communicated with SQL Server and how would it know? I'm just not sure how this is going to work out - when I was programming, I'd either define something as UTF-8, 16 or 32. It seems like SQL Server only accepts UTF-16 and it'll somehow uses surrogate pair when needed. Could someone please explain to me how this is supposed to work? Also, I'd really really appreciate sources and references so I could study more on it.
I tried reading about surrogate pairs and there is quite literally nothing out there except some sources that just touch the surface and explain that "surrogate pair is just a mechanism for represeinting UTF-32 characters using two UTF-16s".
Thank you very much and sorry about the lengthy question.