5

Please note this is a question about internals of compilers.

I just read [1] that when introducing variance for generic types C# team was thinking whether they should automatically compute if the type is co- or contravariant. Of course this is a history now, but nevertheless I am wondering how could this be done?

Is taking all methods (excluding constructors) and checking if the type is in in or out position enough?

[1] Jeffrey Richter, CLR via C#, 4th edition, p.281.

greenoldman
  • 16,895
  • 26
  • 119
  • 185
  • By `in` and `out` do you just mean the keywords, or are you also considering return types and parameters? – 31eee384 Aug 26 '15 at 19:10
  • @31eee384, I mean for param type is in `in` position, and for the result in `out` position. – greenoldman Aug 26 '15 at 19:23
  • @greenoldman: That makes sense, I overthought it, thanks. On another note, could you maybe add the doc you read about this to the post? I think it would be a good addition (and I'm interested). – 31eee384 Aug 26 '15 at 19:31
  • @31eee384, I can add only reference (see updated post), sorry about that. – greenoldman Aug 26 '15 at 19:39
  • 1
    Well you could never ever have a field of that type, because you can always get and set any field, which would mean that that type would need to be invariant.. – Servy Aug 26 '15 at 19:52
  • @Servy, good catch, thank you, but I think **exposed** field. Anyway, now I am reading how Scala handles that (Programing in Scala, 2ed) and there are somewhat more rules even concerning just methods. – greenoldman Aug 26 '15 at 19:54
  • 3
    Eric Lippert explains the exact variance rules much better than I could on [his old blog here](http://blogs.msdn.com/b/ericlippert/archive/2009/12/03/exact-rules-for-variance-validity.aspx). – Lucas Trzesniewski Aug 26 '15 at 20:09
  • @LucasTrzesniewski, could you please post your comment as an answer, it is not possible to accept comment at SO. Thank you. – greenoldman Aug 27 '15 at 06:04
  • @greenoldman unfortunately a link-only answer isn't acceptable on SO, and I'm not going to copy/paste the whole blog post in here. If you're willing to, you can write a summary as a self-answer and accept that, but good luck with writing that summary :P – Lucas Trzesniewski Aug 27 '15 at 07:34

2 Answers2

8

The link in the now-deleted answer is to my article that explains the exact rules for determining variance validity, which does not answer your question. The link you're actually looking for is to my article on why the C# compiler team rejected attempting to compute variance without any syntax, which is here:

http://blogs.msdn.com/b/ericlippert/archive/2007/10/29/covariance-and-contravariance-in-c-part-seven-why-do-we-need-a-syntax-at-all.aspx

Briefly, the reasons for rejecting such a feature are:

  • The feature requires whole-program analysis. Not only is this expensive, it means that small changes in one type can cause the variance choices of many far-away types to change unexpectedly.
  • Variance is something that you want to design in to a type; it is a statement of how you expect the type to be used by its users not just today, but forever. That expectation should be encoded into the program text.
  • There are plenty of cases where it is very difficult to compute the intention of the user, and then what do you do? You have to resolve it by requiring a syntax, and so why not just require it all the time? For example:

interface I<V, W> 
{ 
     I<V, W> M(I<W, V> x);
}

As an exercise, compute what all the possible valid variance annotations are on V and W. Now, how should the compiler do the same computation? What algorithm did you use? And second, given that this is ambiguous, how would you choose to resolve the ambiguity?

Now, I note that this answer thus far also does not answer your question. You asked how it could be done, and all I gave you was reasons why we shouldn't make the attempt to do it. There are many ways it could be done.

For example, take every generic type in the program, and every type parameter of those generic types. Suppose there are a hundred of them. Then there are only three-to-the-hundred possible combinations of invariant, in and out for each; try all of them, see which ones work, and then have a ranking function that chooses from the winners. The problem there of course is that it takes longer than the age of the universe to run.

Now, we could apply a smart pruning algorithm to say "any choice where T is in and is also known to be used in an output position is invalid", so don't check any of those cases. Now we have a situation where we have hundreds of such predicates that must all be applied in order to determine what the applicable set of variance validities are. As I noted in my example above, it can be quite tricky to figure out when something is actually in an input or output position. So this is probably a non-starter as well.

Ah, but that idea implies that analysis of predicates about an algebraic system is a potentially good technique. We could build an engine that generates predicates and then applies a sophisticated SMT solver to it. That would have bad cases that require gazillions of computations, but modern SMT solvers are pretty good in typical cases.

But all this is way, way too much work to do for a feature that has almost no value to the user.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • "expectation should be encoded into the program text," also, it allows the developer to ask the compiler to check this in the event of a mistake - something that you totally expect a statically typed language to do. – Jonathan Dickinson Aug 31 '15 at 14:56
0

In the example:

interface I<T, S, R>
{
  S M(T t, R r);
  T N();
}

you can use the current C# compiler to see if it is allowed to put out (covariance marker) in front of T, S, and R, respectively, and the same for in (contravariance marker). Since T is used both as a parameter type (first parameter of M method) and as a return type (of N method), it can have neither out nor in (current C# compiler can tell, it complains if you try either of them). For S, it is used as a return type, so it cannot have in (current C# compiler knows). And for R, it is used as a parameter type, so it cannot have out (current C# compiler knows).

The designers of C# decided to let the programmer choose if he wanted generic variance or not. So with this example, there are four legal ways to write the I<,,> interface with variance markers:

// 1
interface I<T, S, R>
{
  S M(T t, R r);
  T N();
}

// 2
interface I<T, out S, R>
{
  S M(T t, R r);
  T N();
}

// 3
interface I<T, S, in R>
{
  S M(T t, R r);
  T N();
}

// 4
interface I<T, out S, in R>
{
  S M(T t, R r);
  T N();
}

The other alternative the designers had, was to always make this interface type covariant in S and contravariant in R, giving the programmer no chance to "disable" this. In that case each type parameter would automatically get the "best" generic variance possible. The keywords out and in would not be needed in this context.

Similarly for generic delegate types.

Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
  • 2
    1/3 of your "answer" is rewritten my question, other 2/3 of your "answer" is patronizing me reminding about the syntax of variance in C#. What your "answer" brings **new** to my question? – greenoldman Aug 31 '15 at 05:50
  • @greenoldman I did not intend to be patronizing. I may have misunderstood the question. I tried to answer __How can compiler compute automatically co- and contravariance?__ by saying, "it is more or less the same as the compiler does already, just for each type parameter see if `out`, or `in`, or neither, would be allowed". The automatic part would consist in the compiler automatically applying `out` or `in` whenever it turns out to be legal to apply them. So the very same logic that is in the current C# specification and current C# compiler, could also have been used to "infer" auto-variance. – Jeppe Stig Nielsen Aug 31 '15 at 15:34
  • ... Now, Eric Lippert's answer makes it clear that in some cases you cannot consider each type parameter (`T`, `S`, `R`) in isolation, and so things become more complicated. I had not thought about that when I wrote my answer. – Jeppe Stig Nielsen Aug 31 '15 at 15:37