3

I have been trying to wrap my head around the C99 rules of integral promotion and usual arithmetic conversions of integral types. After burning a few neurons, I came out with a set of rules of my own, which are a lot simpler and yet, I believe, equivalent to the official ones:

Update: for the purpose of this question, I start by defining “physical type” as follows

Definition: two integral types are the same physical type if they have the same size and signedness.

If you think there is something wrong with this definition, then you probably have a good answer for question 2 blow.

Simplified promotion/conversion rules

type ranking: among two integral types T1 and T2, the "best" is:

  • whichever is larger
  • if they have the same size, whichever is unsigned
  • if they have the same size and signedness, either of them, as they are physically the same anyway.

integral promotion: a value of type T should be promoted to

promoted(T) = best(T, int)

usual arithmetic conversions of integral types: before evaluating a binary operator on types T1 and T2, the arguments should be converted to a suitable common type which is:

common(T1, T2) = best(T1, T2, int)

Caveat: although I believe my rules give the correct physical type, they may not provide the correct type name in cases where a single type has different names. For example, on systems where int==long, the official rules say

common(unsigned int, long) = unsigned long

whereas my rules say it's unsigned int (which is physically the same anyway). But this should not be a problem, since the names do not really matter. Or do they?

After this long prelude, here comes the real question, which is two-fold:

Question 1: Are my rules correct?

I read the official ones several times, but I still find them confusing. Thus, I may have misunderstood something. If I am wrong, please provide an example where the official rules and my rules yield different types. I mean: different physical types, not just different types that are physically the same.

A real world example would be preferred. If none can be found, a theoretical example would be OK if the hypothetical C environment is described with enough detail to be convincing (sizes of the relevant types, etc.).

If I am correct here, then the second question becomes relevant.

Question 2: Why should I care about the names of the promoted/converted types?

If I am correct, then the obvious question is "Why did the people in the standards committee write so complicated rules?". They only answer I can think of is that they wanted to specify not only the physical types yielded by the promotion/conversion, but also the proper way to name those types. But then, why did they care? Those types are only used internally by the compiler, it does not matter how we name them as long as we understand what they physically are. Is there something wrong with this reasoning? Can you think of a situation where T1 and T2 are physically the same and yet it matters to know whether things are automatically promoted or converted to T1 rather than T2? Again, a real world example would be preferred, otherwise a theoretical example would do if it is detailed enough.

Rationale

(section added on 2014-11-10, to address some comments)

When searching those topics, I have always seen them discussed in the context of the behavior of arithmetic operators, and more specifically the results returned by those operators. For example, the expression -1L < 1U is problematic, because it is true on systems where longs are really longer than ints, but false otherwise. I believe it is a good thing to understand this sort of problems, yet a bad thing to need a complex ruleset to do so. Hence this effort to build a simpler ruleset that reliably gives the same results.

I fully understand that my rules are useless to anyone who finds the real ones simple enough. I also understand, and respectfully disagree with, those who express the opinion that relying on anything but the official rules is inherently bad. My rules would nevertheless have their usefulness, should they help nobody but me.

About my personal bias: As a physicist, I value simplicity very high. I am used to deal with theories that are not – nor meant to be – the ultimate truth, yet they prove immensely useful, and safe to use as long as you understand their limits of applicability. In any given situation, the best theory is not the most complete: it's the simplest one that is still applicable. For example: I would not use quantum gravity to compute the period of a simple pendulum. My posting this question here is an attempt to get expert opinion on the limits of applicability of the rules above.

So far what I have is:

  • the varargs case (tanks, mafso), which seems to be the only situation in C99 where these rules are, at least in principle, not applicable
  • the _Generic keyword (thanks, Pascal Cuoq) which, being a C11 feature, is slightly out of scope
  • the C++11 auto keyword, which is further out of scope, but interesting nonetheless in that it would bring to the table the (otherwise irrelevant) concerns about aliasing rules.
Edgar Bonet
  • 3,416
  • 15
  • 18
  • There are no systems where `int==long`. The standard says they are distinct types, so they are distinct types. On some systems `sizeof(int)==sizeof(long)` but it doesn't matter, they are still distinct, as in different actual types, not different names for the same type. – n. m. could be an AI Nov 08 '14 at 18:13
  • Different types imply aliasing restrictions, for example. – mafso Nov 08 '14 at 18:21
  • @n.m.: I updated my question to address your comment. – Edgar Bonet Nov 08 '14 at 18:21
  • @mafso: If you have an example where aliasing restrictions are relevant to these promotions/conversions, then you may have a good answer. – Edgar Bonet Nov 08 '14 at 18:24
  • I thought, I could. But trying to do so, I'm not sure. I cannot alias them without taking their address, and I cannot take their address if they don't have an explicitly declared type. – mafso Nov 08 '14 at 18:30
  • Your private definition of "same types" is your private definition. It is not useful when discussing the C language standard. – n. m. could be an AI Nov 08 '14 at 18:30
  • 1
    I have a contrived (too contrived to make it an answer IMO) example: An ABI, where vararg functions have (say, if called with few enough arguments) their `int` and `long` argumentes passed in different registers. If you call, say `printf` with an expression like `1u + 1l` your rules and the standard rules would put it into different reisters (but `printf` must fetch it from the according one). – mafso Nov 08 '14 at 18:40
  • @n.m.: It **is** useful in the context of this particular question, because it provides a vocabulary that simplifies the discussion. If you do not like the word “actual”, replace-it with whatever word you prefer. It doesn't matter how you name it as long as the word is “scoped” only to this question (and its answers). – Edgar Bonet Nov 08 '14 at 18:41
  • @mafso: You seem to have a good point. Really contrived, but it seems valid. – Edgar Bonet Nov 08 '14 at 18:50
  • I don't see how it simplifies the discussion. The word "same" has an established meaning. If you need a different meaning, please use a different word. *That* would simplify the discussion. The answers to your questions are (1) maybe; who should care and why? (2) because `printf ("%ld", 1)` and `int i; long* pi = (long*)&i;` and other such constructs are **undefined behaviour** and the compiler is allowed to convert your program to `abort()` as soon as it sees any such thing. – n. m. could be an AI Nov 08 '14 at 18:56
  • @n.m.: I am defining “actual type”, not “same” and I just changed it to “physical type”, sounds better? OK for the `printf()` point (already raised by mafso). I do not see how the aliasing rules are relevant to this case. – Edgar Bonet Nov 08 '14 at 19:24
  • You only define when two physical types are the same, not what a physical type is. Either way I'm not sure what this definition brings to the table. We must care about the types as defined by the standard, because of the examples I and others gave, there's no way around it. Your rules cannot replace rules given by the standard (*even* if you only think in terms of a particular platform, which *will* bite you sooner or later) because they only compute "physical types", whatever they are, not types as defined by the standard. – n. m. could be an AI Nov 08 '14 at 19:51
  • Your definition of "physical types" is consistent but it doesn't correspond to anything defined by the C standard, and is unlikely to be useful in understanding the rules by which C programs are processed. It could be useful in understanding the behavior of code whose behavior is not defined by the standard (e.g., incorrectly printing a `long` value with a `"%d"` format), but it's better just to avoid doing that. – Keith Thompson Nov 08 '14 at 21:42
  • @n.m.: Defining the sameness of foo is a very standard way of defining a foo, which is understood to be an [equivalence class](http://en.wikipedia.org/wiki/Equivalence_class) of its sameness. – Edgar Bonet Nov 09 '14 at 20:24
  • Your definition is probably mathematically impeccable, it just doesn't define anything important for the working programmer. – n. m. could be an AI Nov 09 '14 at 20:31

2 Answers2

2

Regarding your first question, I think the answer is “yes”: on all normal or even slightly exotic platforms, your proposed rules yield a type of the same representation as the standard's rules.

Regarding your second question, here are two situations where the “names” of the types matter (and I am only using your terminology for clarity; in the phraseology of the standard, long and int are incompatible types even when they happen to be the same size):

C11's _Generic construct: a long expression does not match the int case even if both are 32-bit representations of integers.

Strict aliasing: the compiler is allowed to generate code that assumes that an int variable does not change when you modify a long lvalue. In particular, statements 1 and 3 in the code below can be optimized to return 1;:

{
  long *p;
  int x;
  …
  x = 1; /* 1 */
  *p = 2;
  return x; /* 3 */
}

Incidentally, the standard does not allow printf("%d", 1L) or printf("%ld", 1) either even when both are the same size, although it will happen to work on most platform (I do not include this as a significant example because it would not be a significant change to the standard to decree it should work when the types have the same representation, unlike the two examples above).

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
1

Because various platforms started to use C as a language before efforts were made to officially standardize it, C compilers for various unusual platforms implemented things in various interesting ways. Rather than compel compilers for such platforms to use new rules which would be incompatible with already-existing code, the authors of the standard attempted to write in lots of wrinkles and nuances to accommodate the behaviors of all those weird architectures.

In practice, on most platforms one could get by with a vastly simplified version of the rules. There is no way to compile-time assert the equivalence of types which have identical ranges (e.g. I know of nothing in the Standard that would require 32-bit unsigned int values to use the same byte ordering as 32-bit unsigned long) and the types of pointers to distinct types must be considered semantically distinct even if the types they point to are physically identical, but as noted elsewhere the only situation where a compiler would be allowed to have rvalues of types unsigned int and unsigned long behave differently when both have the same representation would be when those types are passed as variadic arguments, a situation which is a lot messier than should be, and which led to the downfall of the long double type.

If variadic functions' prototypes could have specified "convert all integer values to X, and all floating-point values to type Y", and "vararg.h" and "stdarg.h" had remained as distinct features (the former being used for functions without such prototypes and the latter for functions with them) that would have avoided the need to worry about whether an int32_t is an int or a long. Too late to avoid the problems resulting from the lack of that feature, though it might nonetheless be worth adding to alleviate problems going forward.

Perhaps someday someone will split off a separate "Normative C" language which uses simplified rules that yield the same behaviors as C on modern platforms, but could only be implemented on obscure platforms if they conform to modern usage (e.g. machines with sign/magnitude hardware would be required to compute i+j as (int)(((i ^ 0x80000000u)+j)^0x80000000u)), in which case aspects of the standard which are only applicable to quirky machines could be omitted.

supercat
  • 77,689
  • 9
  • 166
  • 211