5

In short, I am looking for guidance on which of the following two methods should be preferred (and why):

static IEnumerable<T> DistinctA<T>(this IEnumerable<T> xs)
{
    return new HashSet<T>(xs);
}

static IEnumerable<T> DistinctB<T>(this IEnumerable<T> xs) where T : IEquatable<T>
{
    return new HashSet<T>(xs);
}
  • Argument in favour of DistinctA: Obviously, the constraint on T is not required, because HashSet<T> does not require it, and because instances of any T are guaranteed to be convertible to System.Object, which provides the same functionality as IEquatable<T> (namely the two methods Equals and GetHashCode). (While the non-generic methods will cause boxing with value types, that's not what I'm concerned about here.)

  • Argument in favour of DistinctB: The generic parameter constraint, while not strictly necessary, makes visible to callers that the method will compare instances of T, and is therefore a signal that Equals and GetHashCode should work correctly for T. (After all, defining a new type without explicitly implementing Equals and GetHashCode happens very easily, so the constraint might help catch some errors early.)

Question: Is there an established and documented best practice that recommends when to specify this particular constraint (T : IEquatable<T>), and when not to? And if not, is one of the above arguments flawed in any way? (In that case, I'd prefer well-thought-out arguments, not just personal opinions.)

stakx - no longer contributing
  • 83,039
  • 20
  • 168
  • 268
  • Since you are after well-thought-out arguments and not personal opinions, I'll make this a quick comment: I prefer visibility to the caller when things like this are extremely unlikely to change in future. – Simon Whitehead Apr 08 '13 at 22:19

2 Answers2

3

While my comment at Pieter's answer is fully true, I've rethought the exact case of Distinct that you refer to.

This is a LINQ method contract, not just a method.

LINQ is meant to be a common fascade implemented by various providers. Linq2Objects may require an IEquatable, Linq2Sql may require IEquatable too, but Linq2Sql may not require and even not use at all and completely ignore the IEquatableness as the comparison is made by the DB SQL engine.

Therefore, at the layer of LINQ method definitions, it does not make sense to specify the requirement for IEquatable. It would limit and constrain the future LINQ providers to some things they really do not need to care for in their specific domains, and note that LINQ is actually very often all about domain-specificness, as very often the LINQ expressions and parameters are never actually run as code, but they are analyzed and retranslated to other constructs like SQL or XPaths.. Ommission of constraints in this case is reasonable, because you cannot really know what your future-and-unknown-domain-provider will really need to request from the caller. Linq-to-Mushrooms may need to use an IToxinEquality instead of IEquatable!

However, when you are designing interfaces/contracts that will clearly be used as runnable code (not just expression trees that will only work as configuration or declarations), then I do not see no valid point for not providing the constraints.

quetzalcoatl
  • 32,194
  • 8
  • 68
  • 107
  • +1 for the last paragraph. Before that, I was wondering why your answer went into LINQ so much… but the distinction between code-as-data and runnable code makes sense, and is another possible guideline. Thanks! – stakx - no longer contributing Jun 07 '13 at 20:59
2

Start by considering when it might matter which of the two mechanisms is used; I can think of only two:

  1. When the code is being translated to another language (either a subsequent version of C#, or a related language like Java, or a completly dissimilar language such as Haskell). In this case the second definition is clearly better by providing the translator, whether automated or manual, with more information.
  2. When a user unfamiliar with the code is reading it to learn how to invoke the method. Again, I believe the second is clearly better by providing more information readily to such a user.

I cannot think of any circumstance in which the fist definition would be preferred, and where it actually matters beyond personal preference.

Others thoughts?

Pieter Geerkens
  • 11,775
  • 2
  • 32
  • 52
  • I think that all requirements should be stated as constraints if only it is possible, because it allows the compiler to catch the error early at the compilation, way before the runtime. However,I can think of somewhat contrived counterargument that polluting your often-used interfaces with tons of constraints will have negative impact on the actual duration of the compilation, as the compiler will have to analyze and scan type hierarchies more often.. I cannot say I really believe in that, and even if so, I'd still use constraints wherever possible and adequate and simply split the projects ;) – quetzalcoatl Apr 08 '13 at 22:36
  • @quetzalcoatl: If you saw some of my code you would realize how ironic my post is; I am a firm believer in tersenes whenever reasonable and practical. However the point of the post seems to be "When is the extra verbosity worth consideration?". – Pieter Geerkens Apr 08 '13 at 22:48
  • :) I have to admit that I love the code to be brief too, but as a native-C++'er, I am also heavy user of the compiler as the first-time testing tool. When I have a choice between compile-time error detection or a shorter code shorter a bit, I choose the former, especially at module or problem domain boundary. Of course, there's a threshold. +50..150-or-more lines just to bar some very-naiive-bug is not an option and the readability and KISS wins :) – quetzalcoatl Apr 08 '13 at 22:52