Using SQL Server 2016, I have a need to scrub white space a certain way and implement INITCAP.
The whitespace scrubber is simple. I'm having trouble getting the INITCAP replacement working properly.
The accepted answer to Initcap equivalent in mssql is wrong, as noted in the first comment.
My data contains values that have multiple spaces in a row and special characters, (&, %, etc.).
stuff()
: In SQL Server 2016, string_split
does not have an option to prove an ordinal value and does not guarantee that the results are returned in any specific order. So, I need to write code to ensure values are returned from split_string
in the correct order.
convert(xml,...)
: Decodes most of the XML-encoded values.
convert(varchar(max),...)
: ...because XML can't be used when needing SELECT DISTINCT
with T as (
select *
from (
values ('Minesota Mining and Manufacturing')
, ('Minesota Mining & Manufacturing ')
, (' tillamook')
, ('MUTUAL OF OMAHA')
, (' ')
) q(s)
),
scrubbed as (
select T.s as InitialValue
, CASE
WHEN LEN(RTRIM(T.s)) > 0
THEN LTRIM(RTRIM(T.s))
END as s
from T
)
select distinct s.InitialValue
, stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) +
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH('')
),
1,
1,
''
) as INITCAP_xml
, convert(
varchar(max),
convert(
xml,
stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) +
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH('')
),
1,
1,
''
)
)
) as INITCAP_decoded
from scrubbed s
You see in the output that using FOR XML
causes some of the characters to be encoded (like [space] = $#x20;
and & = &
). By converting to XML data type, some of those characters are decoded. But some characters (like &
) remain encoded.
InitialValue | INITCAP_attempt1 | INITCAP_xml | INITCAP_decoded |
---|---|---|---|
Minesota Mining and Manufacturing |
Minesota Mining And   Manufacturing |
Minesota Mining And Manufacturing |
Minesota Mining And Manufacturing |
Minesota Mining & Manufacturing |
Minesota Mining & Manufacturing |
Minesota Mining & Manufacturing |
Minesota Mining & Manufacturing |
tillamook |
Tillamook |
Tillamook |
Tillamook |
MUTUAL OF OMAHA |
Mutual Of Omaha |
Mutual Of Omaha |
Mutual Of Omaha |
|
null | null | null |
REPLACE(s, '&', '&')
doesn't seem like a reasonable option because I don't know what other values I'll run into over time. Is there a good, general way to handle characters that will be encoded by FOR XML
?
Within a view (so, without using user defined functions or stored procedures), is there a better way to implement INITCAP
in SQL Server?