12

First of all: I know how to work around this issue. I'm not searching for a solution. I am interested in the reasoning behind the design choices that led to some implicit conversions and didn't lead to others.

Today I came across a small but influential error in our code base, where an int constant was initialised with a char representation of that same number. This results in an ASCII conversion of the char to an int. Something like this:

char a = 'a';
int z = a;
Console.WriteLine(z);    
// Result: 97

I was confused why C# would allow something like this. After searching around I found the following SO question with an answer by Eric Lippert himself: Implicit Type cast in C#

An excerpt:

However, we can make educated guesses as to why implicit char-to-ushort was considered a good idea. The key idea here is that the conversion from number to character is a "possibly dodgy" conversion. It's taking something that you do not KNOW is intended to be a character, and choosing to treat it as one. That seems like the sort of thing you want to call out that you are doing explicitly, rather than accidentally allowing it. But the reverse is much less dodgy. There is a long tradition in C programming of treating characters as integers -- to obtain their underlying values, or to do mathematics on them.

I can agree with the reasoning behind it, though an IDE hint would be awesome. However, I have another situation where the implicit conversion suddenly is not legal:

char a = 'a';
string z = a; // CS0029 Cannot implicitly convert type 'char' to 'string'

This conversion is in my humble opinion, very logical. It cannot lead to data loss and the intention of the writer is also very clear. Also after I read the rest of the answer on the char to int implicit conversion, I still don't see any reason why this should not be legal.

So that leads me to my actual question:

What reasons could the C# design team have, to not implement the implicit conversion from char to a string, while it appears so obvious by the looks of it (especially when comparing it to the char to int conversion).

pyrocumulus
  • 9,072
  • 2
  • 43
  • 53
  • 4
    By the way, the char->string is perfectly valid in VB.NET(even with Option Strict On). With Option Strict Off even string->char was valid(the first char of the string is taken, a real mess). – Tim Schmelter Sep 11 '18 at 09:12

3 Answers3

11

First off, as I always say when someone asks "why not?" question about C#: the design team doesn't have to provide a reason to not do a feature. Features cost time, effort and money, and every feature you do takes time, effort and money away from better features.

But I don't want to just reject the premise out of hand; the question might be better phrased as "what are design pros and cons of this proposed feature?"

It's an entirely reasonable feature, and there are languages which allow you to treat single characters as strings. (Tim mentioned VB in a comment, and Python also treats chars and one-character strings as interchangeable IIRC. I'm sure there are others.) However, were I pitched the feature, I'd point out a few downsides:

  • This is a new form of boxing conversion. Chars are cheap value types. Strings are heap-allocated reference types. Boxing conversions can cause performance problems and produce collection pressure, and so there's an argument to be made that they should be more visible in the program, not less visible.
  • The feature will not be perceived as "chars are convertible to one-character strings". It will be perceived by users as "chars are one-character strings", and now it is perfectly reasonable to ask lots of knock-on questions, like: can call .Length on a char? If I can pass a char to a method that expects a string, and I can pass a string to a method that expects an IEnumerable<char>, can I pass a char to a method that expects an IEnumerable<char>? That seems... odd. I can call Select and Where on a string; can I on a char? That seems even more odd. All the proposed feature does is move your question; had it been implemented, you'd now be asking "why can't I call Select on a char?" or some such thing.

  • Now combine the previous two points together. If I think of chars as one-character strings, and I convert a char to an object, do I get a boxed char or a string?

  • We can also generalize the second point a bit further. A string is a collection of chars. If we're going to say that a char is convertible to a collection of chars, why stop with strings? Why not also say that a char can also be used as a List<char>? Why stop with char? Should we say that an int is convertible to IEnumerable<int>?
  • We can generalize even further: if there's an obvious conversion from char to sequence-of-chars-in-a-string, then there is also an obvious conversion from char to Task<char> -- just create a completed task that returns the char -- and to Func<char> -- just create a lambda that returns the char -- and to Lazy<char>, and to Nullable<char> -- oh, wait, we do allow a conversion to Nullable<char>. :-)

All of these problems are solvable, and some languages have solved them. That's not the issue. The issue is: all of these problems are problems that the language design team must identify, discuss and resolve. One of the fundamental problems in language design is how general should this feature be? In two minutes I've gone from "chars are convertible to single-character strings" to "any value of an underlying type is convertible to an equivalent value of a monadic type". There is an argument to be made for both features, and for various other points on the spectrum of generality. If you make your language features too specific, it becomes a mass of special cases that interact poorly with each other. If you make them too general, well, I guess you have Haskell. :-)

Suppose the design team comes to a conclusion about the feature: all of that has to be written up in the design documents and the specification, and the code, and tests have to be written, and, oh, did I mention that any time you make a change to convertibility rules, someone's overload resolution code breaks? Convertibility rules you really have to get right in the first version, because changing them later makes existing code more fragile. There are real design costs, and there are real costs to real users if you make this sort of change in version 8 instead of version 1.

Now compare these downsides -- and I'm sure there are more that I haven't listed -- to the upsides. The upsides are pretty tiny: you avoid a single call to ToString or + "" or whatever you do to convert a char to a string explicitly.

That's not even close to a good enough benefit to justify the design, implementation, testing, and backwards-compat-breaking costs.

Like I said, it's a reasonable feature, and had it been in version 1 of the language -- which did not have generics, or an installed base of billions of lines of code -- then it would have been a much easier sell. But now, there are a lot of features that have bigger bang for smaller buck.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • 2
    Wow, thank you for this exceptionally detailed answer. I hoped it would be answered by you, I just didn't expect it actually happening ;-) I figured there would be a lot of things I missed and the biggest thing was the generalisation. I also thought about performance but I had it backwards. It's not the performance cost you want to avoid, it's the fact that you don't see the cost. Your answer has been most helpful, shame I can only upvote and mark it once. Thank you! – pyrocumulus Sep 11 '18 at 17:08
  • 1
    Oh and I did indeed pose the question incorrectly. It's always a question of balancing pros, cons and limited development capacity in software development. It's exactly the same where I work (I imagine everywhere). I'll edit that part of the question to not have it sound so negative for the C# design team. – pyrocumulus Sep 11 '18 at 17:10
2

Char is a value type. No, it is not numerical like int or double, but it still derives from System.ValueType. Then a char variable lives on the stack.

String is a reference type. So it lives on the heap. If you want to use a value type as a reference type, you need to box it first. The compiler needs to be sure that you know what you're doing via boxing. So, you can't have an implicit cast.

SylF
  • 106
  • 5
0

Cause when you do a cast of a char to int, you aren't just changing its' type, you are getting its' index in the ASCII table which is a really different thing, they are not letting you to "convert a char to an int" they are giving you a useful data for a possible need.

Why not letting the cast of a char to string?

Cause they actually represent much different things, and are stored in a much different way.

A char and a string aren't 2 ways of expressing the same thing.

Marco Salerno
  • 5,131
  • 2
  • 12
  • 32
  • 1
    Of course you are changing the type, char and int are two different types – Tim Schmelter Sep 11 '18 at 09:10
  • Corrected, maybe this way it's more understandable – Marco Salerno Sep 11 '18 at 09:10
  • I get what you are saying but I still think that the char to int conversion is less logical than the char to string conversion. I know that a char can easily be represented by an integer, but I can just as easily see that a string consists of single characters. It works both ways. – pyrocumulus Sep 11 '18 at 09:15
  • That's why they still let you handmake it, I suppose they just wanted to underline the difference – Marco Salerno Sep 11 '18 at 09:17
  • 1
    @pyrocumulus: C# is a C derivate and millions of developers treated chars as ints and calculated with them. So it's a long tradition. That's not the case for `char`->`string`. – Tim Schmelter Sep 11 '18 at 09:18
  • @TimSchmelter I can imagine that being the case. However Eric said that the potential for data loss was far more important a reason for the design choice (in the case of char to int, that is). I see no data loss when I create a string from a single character. The lack of conversion in the other direction is also not a reason, because that's also impossible when going from int to char. – pyrocumulus Sep 11 '18 at 09:21
  • 1
    @pyrocumulus it costs more – Marco Salerno Sep 11 '18 at 09:22
  • 1
    @pyrocumulus: forget char->int. It's already explained, developers wanted this feature, period. But apart from that there is no reason to treat type1 as type2. If you have a `char` declare it as `char`, if you have a `string` declare it as `string`. If you want to take the first char, easy: `char c = str[0];`, if you want a char as string use `c.ToString()`. – Tim Schmelter Sep 11 '18 at 09:24
  • @MarcoSalerno that's true. But is that really a reason for (not) having implicit conversions? You can also assign an int to an object, and box it. That's not exactly cheap either. – pyrocumulus Sep 11 '18 at 09:24
  • @TimSchmelter I get what you are saying. Perhaps historical reasons were more important than Eric let on. It would appear that the char to int conversion is more of an exception than it is a rule. At the moment, I can't think of other implicit conversions that **really** convert between types in that way. – pyrocumulus Sep 11 '18 at 09:31
  • 4
    Chars are not ASCII characters. Chars are *UTF-16 code units*. If the upper 9 bits of a char are zero then the lower 7 bits are a valid ASCII code, but you should not think of chars as ASCII; that is a misleading and incorrect characterization. – Eric Lippert Sep 11 '18 at 16:19
  • 3
    @pyrocumulus: If I ever implied that historical reasons were unimportant, then I did a poor job of explaining myself. Conformance to the intuitions and expectations of existing C, C++ and Java users was foundational to the design of C#. C# is often characterized as a response to Java, and there is some truth to that, but it is more accurate to say that C# and Java shared the same design goal: to make a safer C. In fact, the first two code names for the C# project were "COOL" which stood for "C-like Object-Oriented Language", and "SafeC". – Eric Lippert Sep 11 '18 at 16:24