33

I have a question about the implicit type conversion

Why does this implicit type conversion work in C#? I've learned that implicit code usually don't work.

I have a code sample here about implicit type conversion

 char c = 'a';
 int x = c;
 int n = 5;
 int answer = n * c;
 Console.WriteLine(answer);
kristianp
  • 5,496
  • 37
  • 56
tintincutes
  • 5,618
  • 25
  • 67
  • 86
  • 1
    That's actually a really good question. In C, a character is more or less a synonym for a byte, they are both numeric types. In C# though, a character is a unicode character... – Matthew Scharley Oct 01 '09 at 11:40
  • so the answer would be because the character is a unicode character that's why it works in C#? – tintincutes Oct 01 '09 at 11:41
  • 24
    Just to correct your use of jargon -- this is not an implicit cast. This is an implicit conversion. "Casting" is the use of the cast operator; an implicit conversion is a conversion that does NOT require the cast operator. – Eric Lippert Oct 01 '09 at 14:40
  • Matthew: In .NET a character is either a Unicode character from the BMP or a surrogate character. Afaik no way to put a character from the Astral Planes into a single `char`. – Joey Oct 02 '09 at 05:46
  • @Eric: to be nitpicky, the term cast can be applied that way as it means to distort or twist, it can also mean a calculation. So implicit cast makes perfect sense to me... – RCIX Nov 20 '09 at 01:03

9 Answers9

79

UPDATE: I am using this question as the subject of my blog today. Thanks for the great question. Please see the blog for future additions, updates, comments, and so on.

http://blogs.msdn.com/ericlippert/archive/2009/10/01/why-does-char-convert-implicitly-to-ushort-but-not-vice-versa.aspx


It is not entirely clear to me what exactly you are asking. "Why" questions are difficult to answer. But I'll take a shot at it.

First, code which has an implicit conversion from char to int (note: this is not an "implicit cast", this is an "implicit conversion") is legal because the C# specification clearly states that there is an implicit conversion from char to int, and the compiler is, in this respect, a correct implementation of the specification.

Now, you might sensibly point out that the question has been thoroughly begged. Why is there an implicit conversion from char to int? Why did the designers of the language believe that this was a sensible rule to add to the language?

Well, first off, the obvious things which would prevent this from being a rule of the language do not apply. A char is implemented as an unsigned 16 bit integer that represents a character in a UTF-16 encoding, so it can be converted to a ushort without loss of precision, or, for that matter, without change of representation. The runtime simply goes from treating this bit pattern as a char to treating the same bit pattern as a ushort.

It is therefore possible to allow a conversion from char to ushort. Now, just because something is possible does not mean it is a good idea. Clearly the designers of the language thought that implicitly converting char to ushort was a good idea, but implicitly converting ushort to char is not. (And since char to ushort is a good idea, it seems reasonable that char-to-anything-that-ushort-goes-to is also reasonable, hence, char to int. Also, I hope that it is clear why allowing explicit casting of ushort to char is sensible; your question is about implicit conversions.)

So we actually have two related questions here: First, why is it a bad idea to allow implicit conversions from ushort/short/byte/sbyte to char? and second, why is it a good idea to allow implicit conversions from char to ushort?

Unlike you, I have the original notes from the language design team at my disposal. Digging through those, we discover some interesting facts.

The first question is covered in the notes from April 14th, 1999, where the question of whether it should be legal to convert from byte to char arises. In the original pre-release version of C#, this was legal for a brief time. I've lightly edited the notes to make them clear without an understanding of 1999-era pre-release Microsoft code names. I've also added emphasis on important points:

[The language design committee] has chosen to provide an implicit conversion from bytes to chars, since the domain of one is completely contained by the other. Right now, however, [the runtime library] only provide Write methods which take chars and ints, which means that bytes print out as characters since that ends up being the best method. We can solve this either by providing more methods on the Writer class or by removing the implicit conversion.

There is an argument for why the latter is the correct thing to do. After all, bytes really aren't characters. True, there may be a useful mapping from bytes to chars, but ultimately, 23 does not denote the same thing as the character with ascii value 23, in the same way that 23B denotes the same thing as 23L. Asking [the library authors] to provide this additional method simply because of how a quirk in our type system works out seems rather weak. So I would suggest that we make the conversion from byte to char explicit.

The notes then conclude with the decision that byte-to-char should be an explicit conversion, and integer-literal-in-range-of-char should also be an explicit conversion.

Note that the language design notes do not call out why ushort-to-char was also made illegal at the same time, but you can see that the same logic applies. When calling a method overloaded as M(int) and M(char), when you pass it a ushort, odds are good that you want to treat the ushort as a number, not as a character. And a ushort is NOT a character representation in the same way that a ushort is a numeric representation, so it seems reasonable to make that conversion illegal as well.

The decision to make char go to ushort was made on the 17th of September, 1999; the design notes from that day on this topic simply state "char to ushort is also a legal implicit conversion", and that's it. No further exposition of what was going on in the language designer's heads that day is evident in the notes.

However, we can make educated guesses as to why implicit char-to-ushort was considered a good idea. The key idea here is that the conversion from number to character is a "possibly dodgy" conversion. It's taking something that you do not KNOW is intended to be a character, and choosing to treat it as one. That seems like the sort of thing you want to call out that you are doing explicitly, rather than accidentally allowing it. But the reverse is much less dodgy. There is a long tradition in C programming of treating characters as integers -- to obtain their underlying values, or to do mathematics on them.

In short: it seems reasonable that using a number as a character could be an accident and a bug, but it also seems reasonable that using a character as a number is deliberate and desirable. This asymmetry is therefore reflected in the rules of the language.

Does that answer your question?

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • It'd still be nice (sometimes) if this worked out of the box in C# though, like it does in C: for (char letter = 'A'; letter <= 'Z'; letter+=2) – Matthew Scharley Oct 04 '09 at 14:28
  • 1
    For the record, the above does work in C#, with a little more syntax: letter = (char)(letter + 2) – Matthew Scharley Oct 04 '09 at 14:31
  • @Matthew `letter += (char)2` also works, though it's not exactly intuitive that a letter plus STX equals another letter. @Lucas ∞ – Matthew Read Jul 07 '11 at 16:43
  • I'd love to see why you can't convert at all from Char to Integer in VB.NET. It seems like the underlying implementation should be the same, and therefore not a problem to implement. – Jeff B Oct 10 '13 at 17:02
  • 3
    Here's an example of why this implicit conversion seems like a mistake to me, which is how I ended up at this page. I have three methods: `WriteString(string value)`, `WriteString(string value, int length)`, and `WriteString(string value, int length, char padChar)` - the first is for writing a string of varying length, and the other two are for fixed-length. A buggy call of the form `WriteString(foo, ' ')` (where the caller has elided the length) compiles without any issue, since the second `WriteString` overload gets resolved by the compiler without warning. – Oliver Mellet Jan 21 '17 at 21:14
12

The basic idea is that conversions leading to potential data-loss can be implicit, whereas conversions, which may lead to data-loss have to be explicit (using, for instance, a cast operator).

So implicitly converting from char to int will work in C#.

[edit]As others pointed out, a char is a 16-bit number in C#, so this conversion is just from a 16-bit integer to a 32-bit integer, which is possible without data-loss.[/edit]

C# supports implicit conversions, the part "usually don't work" is probably coming from some other language, probably C++, where some glorious string implementations provided implicit conversions to diverse pointer-types, creating some gigantic bugs in applications.

When you, in whatever language, provide type-conversions, you should also default to explicit conversions by default, and only provide implicit conversions for special cases.

gimpf
  • 4,503
  • 1
  • 27
  • 40
  • 4
    So then what explains why in C# there is no implicit conversion from short back to char? That can be done losslessly; your logic would indicate that there should be an implicit conversion from short to char, but there is not. Why not? – Eric Lippert Oct 01 '09 at 14:47
  • That's why I said "basic idea", as I got it from the book "CLR via C#". The final word is said in the reference, as quoted by _Dzmitry Huba_. I do not know about the rationale for this decision, but I expect that they tried to improve the semantic difference between characters and numbers a bit, similar to having to mark `ref` and `out` parameters on the call-site. – gimpf Oct 01 '09 at 15:48
  • Indeed. As it turns out, I do know a little bit about the rationale for this decision. Your conjecture is spot on. :) – Eric Lippert Oct 01 '09 at 16:20
9

From C# Specification

6.1.2 Implicit numeric conversions The implicit numeric conversions are:

• From sbyte to short, int, long, float, double, or decimal.

• From byte to short, ushort, int, uint, long, ulong, float, double, or decimal.

• From short to int, long, float, double, or decimal.

• From ushort to int, uint, long, ulong, float, double, or decimal.

• From int to long, float, double, or decimal.

• From uint to long, ulong, float, double, or decimal.

• From long to float, double, or decimal.

• From ulong to float, double, or decimal.

• From char to ushort, int, uint, long, ulong, float, double, or decimal.

• From float to double.

Conversions from int, uint, long, or ulong to float and from long or ulong to double may cause a loss of precision, but will never cause a loss of magnitude. The other implicit numeric conversions never lose any information. There are no implicit conversions to the char type, so values of the other integral types do not automatically convert to the char type.

Community
  • 1
  • 1
Dzmitry Huba
  • 4,493
  • 20
  • 19
5

From the MSDN page about the char (char (C# Reference) :

A char can be implicitly converted to ushort, int, uint, long, ulong, float, double, or decimal. However, there are no implicit conversions from other types to the char type.

It's because they have implemented an implicit method from char to all those types. Now if you ask why they implemented them, I'm really not sure, certainly to help working with ASCII representation of chars or something like that.

Gimly
  • 5,975
  • 3
  • 40
  • 75
  • yeah, that's why I also wondered, I'm new to programming but as my teacher explains it, I saw the inconsistency of it. Especially in C#, they are very strict to the rules of OO and it should be clean all the time and this one is quite surprising. – tintincutes Oct 01 '09 at 11:50
  • ´char´ doesn't *explicitly* implement "implicit casting". If you look at the Char source with reflector, you can't see any of these. Some **conversions** are implicit to the language itself. – bruno conde Oct 01 '09 at 16:41
2

Casting will cause data loss. Here char is 16 bit and int is 32 bit. So the cast will happen without loss of data.

Real life example: we can put a small vessel into a big vessel but not vice versa without external help.

Eli_B
  • 163
  • 2
  • 10
anishMarokey
  • 11,279
  • 2
  • 34
  • 47
1

The core of @Eric Lippert's blog entry is his educated guess for the reasoning behind this decision of the c# language designers:

"There is a long tradition in C programming of treating characters as integers 
-- to obtain their underlying values, or to do mathematics on them."

It can cause errors though, such as:

var s = new StringBuilder('a');

Which you might think initialises the StringBuilder with an 'a' character to start with, but actually sets the capacity of the StringBuilder to 97.

kristianp
  • 5,496
  • 37
  • 56
0

The implicit conversion from char to number types makes no sense, in my opinion, because a loss of information happens. You can see it from this example:

string ab = "ab";
char a = ab[0];
char b = ab[1];
var d = a + b;   //195

We have put all pieces of information from the string into chars. If by any chance only the information from d is kept, all that is left to us is a number which makes no sense in this context and cannot be used to recover the previously provided information. Thus, the most useful way to go would be to implicitly convert the "sum" of chars to a string.

kristianp
  • 5,496
  • 37
  • 56
AGuyCalledGerald
  • 7,882
  • 17
  • 73
  • 120
0

It works because each character is handled internally as a number, hence the cast is implicit.

Pete OHanlon
  • 9,086
  • 2
  • 29
  • 28
  • 2
    Though plausible, this reasoning is not actually logical. If that were the reasoning then there would be an implicit conversion from ushort to char, but there is no such implicit conversion. – Eric Lippert Oct 01 '09 at 14:44
0

The char is implicitly cast to it's Unicode numeric value, which is an integer.

Mark Bell
  • 28,985
  • 26
  • 118
  • 145