19

I have code like this:

IEnumerable<string?> items = new [] { "test", null, "this" };
var nonNullItems = items.Where(item => item != null); //inferred as IEnumerable<string?>
var lengths = nonNullItems.Select(item => item.Length); //nullability warning here
Console.WriteLine(lengths.Max());

How can I write this code in a convenient way such that:

  • There is no nullability warning, because the type nonNullItems is inferred as IEnumerable<string>.
  • I don't need to add unchecked non-nullability assertions like item! (because I want to benefit from the compilers sanity checking, and not rely on me being an error-free coder)
  • I don't add runtime checked non-nullability assertions (because that's pointless overhead both in code-size and at runtime, and in case of human error that fails later than ideal).
  • The solution or coding pattern can apply more generally to other sequences of items of nullable-reference type.

I'm aware of this solution, which leverages the flow-sensitive typing in the C# 8.0 compiler, but it's.... not so pretty, mostly because it's so long and noisy:

var notNullItems = items.SelectMany(item => 
    item != null ? new[] { item } : Array.Empty<string>())
);

Is there a better alternative?

Eamon Nerbonne
  • 47,023
  • 20
  • 101
  • 166
  • Does this answer your question? [Using Linq's Where/Select to filter out null and convert the type to non-nullable cannot be made into an extension method](https://stackoverflow.com/questions/59431558/using-linqs-where-select-to-filter-out-null-and-convert-the-type-to-non-nullabl) – Pavel Anikhouski Jan 14 '20 at 19:06

4 Answers4

9

Unfortunately you will have to tell the compiler that you know more about the situation than it does.

One reason would be that the Where method has not been annotated in a way that lets the compiler understand the guarantee for non-nullability, nor is it actually possible to annotate it. There might be a case for having additional heuristics added to the compiler to understand some basic cases, like this one, but currently we do not have it.

As such, one option would be to use the null forgiving operator, colloquially known as the "dammit operator". You touch upon this yourself, however, instead of sprinkling exclamation marks all over the code where you use the collection, you can instead tuck on an additional step on producing the collection which, at least to me, makes it more palatable:

var nonNullItems = items.Where(item => item != null).Select(s => s!);

This will flag nonNullItems as IEnumerable<string> instead of IEnumerable<string?>, and thus be handled correctly in the rest of your code.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • 1
    If needed more than once this pattern could, of course, itself be captured in an extension method `WhereNotNull` (or rather two, to also handle nullable value types). – Jeroen Mostert Oct 14 '19 at 08:42
  • Since a lot of the compiler support is simple type inferencing, I would love for the compiler to see a `Where(s => s != null)` and make the inference itself, rather than adding additional extension methods. However, it would make do in a pinch. – Lasse V. Karlsen Oct 14 '19 at 08:43
  • 1
    I can imagine a lot of potential problems with a rule that essentially has to change the signature of `.Where()` based on the lambda passed into it. It would be especially confusing since it would probably break on the slightest complication (passing an arbitrary compiled delegate wouldn't work, unless the compiler tacked on even more context "just in case"; code that isn't `Enumerable.Where` might behave differently, etc.) Nice, yes, but fraught with complications. – Jeroen Mostert Oct 14 '19 at 08:52
  • @JeroenMostert I doubt it's ever going to be implemented, but I don't actually think those complications are particularly serious; the same namely holds for most of the flow-sensitive typing in the first place. You cannot trivially capture the flow once you extract a method. – Eamon Nerbonne Oct 14 '19 at 09:08
  • So as a matter of code-maintainability I'm extremely leery of adding `!` all over the place; it makes it very difficult to understand how much the compiler is still checking, and how much it isn't. @mu88 's suggestion of a method that's declared only once looks more acceptable. – Eamon Nerbonne Oct 14 '19 at 09:11
  • @EamonNerbonne: Yes, but the whole issue with LINQ is that it's heavily built on extension methods and (eventually) compiling expression trees, so those restrictions hit this scenario particularly hard. It's still quite doable as long as you accept that it will fail if you do anything more interesting than a literal `Enumerable.Where(s => s != null)`. As it's all best-effort heuristics anyway, covering that single, most common scenario might be enough. – Jeroen Mostert Oct 14 '19 at 09:12
9

I think you'll have to help the compiler in either one way or another. Calling .Where() is never safe of returning not-null. Maybe Microsoft could add some logic to determine basic scenarios like yours, but AFAIK that's not the situation right now.

However, you could write a simple extension method like that:

public static class Extension
{
    public static IEnumerable<T> WhereNotNull<T>(this IEnumerable<T?> o) where T:class
    {
        return o.Where(x => x != null)!;
    }
}
mu88
  • 4,156
  • 1
  • 23
  • 47
  • 3
    The value type counterpart would be `WhereNotNull(this IEnumerable source) where T : struct => source.Where(t => t != null).Select(t => t.GetValueOrDefault())`. – Jeroen Mostert Oct 14 '19 at 09:15
  • @JeroenMostert Is there a reason you'd use `.GetValueOrDefault()` instead of `.Value`? – Eamon Nerbonne Oct 14 '19 at 09:17
  • 1
    So this works: and I'm surprised by that, because at first glance I assumed this trailing `!` only asserts something about the enumerable, not its contents - but apparently `!` simply turns off checking in the whole type (or even expression)? – Eamon Nerbonne Oct 14 '19 at 09:18
  • According to the [specification](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-8.0/nullable-reference-types-specification#the-null-forgiving-operator), it evaluates the whole expression – mu88 Oct 14 '19 at 09:20
  • @JeroenMostert Do you mind if I copy your value type counterpart into my example for the sake of completeness? – mu88 Oct 14 '19 at 09:20
  • 1
    @EamonNerbonne: yes: if it is known the value is not null, `GetValueOrDefault` omits a branch, whereas `.Value` must keep the null check. Admittedly this is a micro-optimization. – Jeroen Mostert Oct 14 '19 at 09:24
  • 1
    @mu88: Of course I don't mind, if I minded I'd have supplied my own answer. :-) – Jeroen Mostert Oct 14 '19 at 09:25
  • @JeroenMostert I don't doubt there's conceptually a branch there, but it's something the JIT is potentially likely to fiddle with; so I did a really quick and dirty microbench here: https://gist.github.com/EamonNerbonne/51a520e2a04806d34d4c38d1362857c5 - and the gist of it is; `IEnumerable<>` overhead dominates completely, rendering the point moot, and for arrays the effects are not straightforward and likely very dependant on the specifics of the case in question (and in this example `GetValueOrDefault` happens to even turn out the slowest, weirdly enough!) – Eamon Nerbonne Oct 14 '19 at 09:53
  • @EamonNerbonne: Props to you for actually running a benchmark, although I recommend [benchmark.net](https://benchmarkdotnet.org/) for microbenchmarks like these to avoid common pitfalls with warmup and jitting. Yes, I would expect the enumeration overhead to dominate here in any case, the compiler and jitter version matter a great deal, and as always, actual optimization should be left to things that have been measured to be bottlenecks. Using `GetValueOrDefault` instead of `Value` [definitely does have its scenarios, though](https://github.com/dotnet/coreclr/pull/22297). – Jeroen Mostert Oct 14 '19 at 10:14
  • @JeroenMostert In my experience benchmark.net is often not worth it; it's more time to do; it's detection for weirdness is nice but I've never been surprised by it; there are some tricky gotchas with setups and in general the API for even slightly complex scenarios is terrible; The per-loop overhead is considerable, meaning I need to run for at least 10 times longer if not more to get equivalent accuracy, yet the numbers converge to the same thing... Not saying it's not got it's uses, (love the analyzers, and for statistically messy cases it's OK), but it's totally overhyped. – Eamon Nerbonne Oct 16 '19 at 07:16
4

I don't know if this answer meets the criteria for your 3rd bullet point, but then your .Where() filter does not either, so...

Replace

var nonNullItems = items.Where(item => item != null)

with

var nonNullItems = items.OfType<string>()

This will yield an inferred type of IEnumerable<string> for nonNullItems, and this technique can be applied to any nullable reference type.

Eric Lease
  • 4,114
  • 1
  • 29
  • 45
  • By the "third bullet point" I mean no *additional* checks beyond the filter, i.e. nothing like `Where(o => o != null).Select(o=> o ?? throw ...)` since although that will infer correctly, it's inefficient, and I'm going to be doing that a lot, and I'm OK with carefully checking a tiny, trivial implementation like `Where(o=>o!=null)` manually once - I just don't want it to infect the entire codebase. – Eamon Nerbonne Oct 28 '19 at 15:06
  • I did consider the OfType solution, but it's annoying that I need to specify the type explicitly to the point that I wouldn't really call `OfType<>` used inline a generally applicable strategy at all. And if you make an custom extension method or something, you'll still be left with the fact that OfType is surprisingly slow; in my microbenchmark 4 times slower for a long array of strings; and shudder the thought that you ever dare use a valuetype with OfType... all that boxing.... – Eamon Nerbonne Oct 28 '19 at 15:10
1

FWIW special support is being considered for C# 10: https://github.com/dotnet/csharplang/issues/3951

Shay Rojansky
  • 15,357
  • 2
  • 40
  • 69