25

This must be something really simple. But i'm going to ask it anyway, because i think that others will also struggle with it. Why does following simple LINQ query is not executed always with the new variable-value instead of always using the first?

static void Main(string[] args)
{
    Console.WriteLine("Enter something:");
    string input = Console.ReadLine();       // for example ABC123
    var digits = input.Where(Char.IsDigit);  // 123
    while (digits.Any())
    {
        Console.WriteLine("Enter a string which doesn't contain digits");
        input = Console.ReadLine();         // for example ABC
    }
    Console.WriteLine("Bye");
    Console.ReadLine();
}

In the commented sample it will enter the loop since the input ABC123 contains digits. But it will never leave it even if you enter something like ABC since digits still is 123.

So why does the LINQ query not evaluate the new input-value but always the first?

I know i could fix it with this additional line:

while (digits.Any())
{
    Console.WriteLine("Enter a string which doesn't contain digits");
    input = Console.ReadLine();          
    digits = input.Where(Char.IsDigit);  // now it works as expected
}

or - more elegant - by using the query directly in the loop:

while (input.Any(Char.IsDigit))
{
    // ...
}
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939

6 Answers6

42

The difference is that you're changing the value of the input variable, rather than the contents of the object that the variable refers to... so digits still refers to the original collection.

Compare that with this code:

List<char> input = new List<char>(Console.ReadLine());
var digits = input.Where(Char.IsDigit);  // 123
while (digits.Any())
{
    Console.WriteLine("Enter a string which doesn't contain digits");
    input.Clear();
    input.AddRange(Console.ReadLine());
}

This time, we're modifying the content of the collection that input refers to - and as digits is effectively a view over that collection, we get to see the change.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Am I correct - this has happened in question example because of immutability of strings? – fex Nov 05 '14 at 10:23
  • 1
    @fex: The immutability of strings was not the reason for this issue but it was the reason for my confusion. If the string was a collection like `List` i would modify it directly and not assign a new list to the variable. – Tim Schmelter Nov 05 '14 at 10:28
  • 1
    @fex: Not directly. If strings were mutable then changing the value of `input` would *still* not affect `digits`... but the content referred to by `input` could have been changed instead. – Jon Skeet Nov 05 '14 at 10:28
10

You're assigning a new value to input, but the digits sequence is still derived from the initial value of input. In other words, when you do digits = input.Where(Char.IsDigit), it captures the current value of the input variable, not the variable itself. Assigning a new value to input has no effect on digits.

Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
6

This line:

input.Where(Char.IsDigit)

is equivalent to:

Enumerable.Where(input, Char.IsDigit)

Thus, the value of input is being passed as the source of the .Where query, not a reference to input.

The first fix you proposed works because it uses the freshly-assigned value of input on the line prior.

Bryan Watts
  • 44,911
  • 16
  • 83
  • 88
  • Yes. One can say that `input` is a ByValue parameter, not a ByRef parameter (it does not say `ref` or `out`). – Jeppe Stig Nielsen Nov 04 '14 at 15:52
  • @JeppeStigNielsen: That is actually not true. Strings are reference types (that's why you can have a null string). Which means that the variable `input` actually contains reference to the string. If `string` was mutable, you could pass a string to a function as a normal parameter, and modifying the string inside would cause modification of the original string as well. But since you cannot modify strings, then in some scenarios it has similar effect as passing parameter as ByValue. – Tom Pažourek Nov 05 '14 at 12:01
  • @tomp I know `string` is a reference type. That was not what I was taking about. I was taking about whether the parameter was a ByRef parameter (either `ref string` or `out string`) or not. So we agree. It is unfortunate that the two distinct notions "reference type" (for example `class` (etc.), not `struct`) and "by ref parameter" (either `ref` or `out`) have so similar names. It leads to many misunderstandings. I was actually aware of that and trying to make my wording precise, but still I was misunderstood ... – Jeppe Stig Nielsen Nov 05 '14 at 13:42
  • @JeppeStigNielsen: Oh, you are right of course, I was thinking about it too much. – Tom Pažourek Nov 05 '14 at 18:08
4

The digits enumerable refers to a copy of the string that input contained when you created the enumerable. It doesn't hold a reference to the input variable, and changing the value stored in input will not cause materializations of the enumerable to use the new value.

Remember that Where is a static extension method, and accepts the object you're invoking it on as a parameter.

Asad Saeeduddin
  • 46,193
  • 6
  • 90
  • 139
4

This is almost a comment, but contains structured code, so I submit it as an answer.

The following slight modification of your code will work:

  Console.WriteLine("Enter something:");
  string input = Console.ReadLine();       // for example ABC123
  Func<bool> anyDigits = () => input.Any(Char.IsDigit);  // will capture 'input' as a field
  while (anyDigits())
  {
    Console.WriteLine("Enter a string which doesn't contain digits");
    input = Console.ReadLine();         // for example ABC
  }
  Console.WriteLine("Bye");
  Console.ReadLine();

Here input is captured (closure) by the delegate of type Func<bool>.

Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
  • ReSharper whines, "Access to modified closure" for the first use of `input`. Suggest you change to `Func` and call using `while (anyDigits(input))` (which arguably improves readability). – onedaywhen Nov 05 '14 at 09:55
  • @onedaywhen Yep! This was not meant to be the best way to write the code. It was just meant to be a "minimal" change of the original code (from the question) that actually worked. I do not encourage or recommend people to code like above. The reason why ReSharper alerts you is that the closure semantics can be confusing to the reader of the code. To get the desired functionality, we do not need access to modified closure (and the asker knows better ways to make things work already in the question). – Jeppe Stig Nielsen Nov 05 '14 at 09:58
2

I'm answering just to add a precision to the other good answers, about the deferred execution.

Even if the LINQ query as not yet been evaluated (using .Any()), the query internally always refers to the initial content of the variable. Even if the LINQ query is evaluated after something new has been affected to the variable, the initial content doesn't change and the deferred execution will use the initial content the query has always been referring to:

var input = "ABC123";
var digits = input.Where(Char.IsDigit);
input = "NO DIGIT";
var result = digits.ToList();   // 3 items
ken2k
  • 48,145
  • 10
  • 116
  • 176
  • 1
    Just be warned that 'initial content' can point to a mutable structure. e.g. `input = new List{1}; var even = input.where (x => x%2 ==0); input.Add(2); var result = even.ToList();` – NPSF3000 Nov 05 '14 at 00:04
  • 2
    @NPSF3000: if it would have been a mutable collection i wouldn't have asked this question because of the lack of an issue ;-) – Tim Schmelter Nov 05 '14 at 11:24