Well, first we must answer the question: What is a string?
The C-standard defines it as a contiguous sequence of characters terminated by and including the first null character.1
It also mentions varieties using wchar_t
, char16_t
, or char32_t
instead of char
.
It also provides many functions for string-manipulation, and string-literals for notational convenience.
So, a sequence of characters can be a string, a char[]
might hold a string, and a char*
might point to one.
LPCSTR
is a windows typedef for const char*
with the added semantics that it should point to a string or be NULL
.
TCHAR
is one of a number of preprocessor-defines used for transitioning windows code from char
to wchar_t
. Depending on what TCHAR
is, a TCHAR[]
might be able to hold a string, or a wide-string.
C++ mixes up things a bit because it adds a data-type for handling strings. To reduce ambiguity, string is only used for the abstract concept, you have to rely on the context to disambiguate or be more explicit.
So the C string corresponds with the C++ null-terminated-byte-string, or NTBS.2
Yes, C++ also knows their wide varieties.
And C++ incorporates the C functions and adds some more.
In addition, C++ has std::basic_string<>
for storing all kinds of counted strings, and some convenience-typedefs like std::string
.
And now we get to the third language yet, namely C++/CLI.
Which incorporates all I spoke above from C++, and adds the CLI type System::String
into the mix.
System::String
is an immutable UTF-16 counted-string.
Now to answer the question why C++ does not define one single concrete type to be a string can be answered:
There are different types of string in C++ for interoperability, history, efficiency and convenience. Always use the right tool for the job.
Java and .Net do the same with byte-arrays, char-arrays, string-builders and the like.
Reference 1: C11 final draft, definition of string:
7. Library
7.1 Introduction
7.1.1 Definitions of terms
1 A string is a contiguous sequence of characters terminated by and including the first null character. The term multibyte string is sometimes used instead to emphasize special processing given to multibyte characters contained in the string or to avoid confusion with a wide string. A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters, in order.
Reference 2: C++1z draft n4659 NTBS:
20.4.2.1.5.1 Byte strings [byte.strings]
1 A null-terminated byte string, or NTBS, is a character sequence whose highest-addressed element with defined content has the value zero (the terminating null character); no other element in the sequence has the value zero.163
2 The length of an NTBS is the number of elements that precede the terminating null character. An empty ntbs has a length of zero.
3 The value of an NTBS is the sequence of values of the elements up to and including the terminating null character.
4 A static NTBS is an NTBS with static storage duration.164