4

I think the best way to explain my question is with a short (generic) linq-to-objects code sample:

IEnumerable<string> ReadLines(string filename)
{
    string line;
    using (var rdr = new StreamReader(filename))
        while ( (line = rdr.ReadLine()) != null)
           yield return line;
}

IEnumerable<int> XValuesFromFile(string filename)
{
    return ReadLines(filename)
               .Select(l => l.Substring(3,3))
               .Where(l => int.TryParse(l))
               .Select(i => int.Parse(i));
}

Notice that this code parses the integer twice. I know I'm missing an obvious simple way to eliminate one of those calls safely (namely because I've done it before). I just can't find it right now. How can I do this?

Furqan Safdar
  • 16,260
  • 13
  • 59
  • 93
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794

3 Answers3

9

How about:

int? TryParse(string s)
{
    int i;
    return int.TryParse(s, out i) ? (int?)i : (int?)null;
}
IEnumerable<int> XValuesFromFile(string filename)
{
    return from line in ReadLines(filename)
           let start = line.Substring(3,3)
           let parsed = TryParse(start)
           where parsed != null
           select parsed.GetValueOrDefault();
}

You could probably combine the second/third lines if you like:

    return from line in ReadLines(filename)
           let parsed = TryParse(line.Substring(3,3))

The choice of GetValueOrDefault is because this skips the validation check that casting (int) or .Value perform - i.e. it is (ever-so-slightly) faster (and we've already checked that it isn't null).

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • I guess I'm looking more for the generic case of filtering an enumerable based on a complex transformation - keep the changed version of everything that passed the change. This may be case for writing a new "operator". – Joel Coehoorn Feb 02 '10 at 19:54
  • Is using `!= null` and `GetValueOrDefault()` really faster than using `where parsed.HasValue` and `select parsed.Value`? I guess I should go run some tests, because that seems counter-intuitive to me. – Joel Mueller Feb 02 '10 at 19:58
  • Another approach is to write a method that returns a `Tuple` result, if you've got .NET 4 or want to write your own Tuple class. This is how F# automatically handles TryParse and similar methods. Then the LINQ would be `where tuple.Item1 select tuple.Item2` – Joel Mueller Feb 02 '10 at 19:59
  • Joel Mueller: yeah, I was already working on something kinda like that :) – Joel Coehoorn Feb 02 '10 at 20:11
  • @Joel Mueller - `!=null` is **exactly** `HasValue`, so that is no different. The `GetValueOrDefault()` is a *tiny* bit faster by skipping the check - it simply returns the inner field directly. – Marc Gravell Feb 02 '10 at 20:40
3

It's not exactly pretty, but you can do:

return ReadLines(filename)
    .Select(l =>
                {
                    string tmp = l.Substring(3, 3);
                    int result;
                    bool success = int.TryParse(tmp, out result);
                    return new
                               {
                                   Success = success,
                                   Value = result
                               };
                })
    .Where(i => i.Success)
    .Select(i => i.Value);

Granted, this is mostly just pushing the work into the lambda, but it does provide the correct answers, with a single parse (but extra memory allocations).

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • Marc's option of using a Nullable could be used here instead of the anonymous class, as well, which would prevent the GC pressure from occurring... – Reed Copsey Feb 02 '10 at 19:50
3

I think I'll go with something like this:

IEnumerable<O> Reduce<I,O>(this IEnumerable<I> source, Func<I,Tuple<bool, O>> transform )
{
    foreach (var item in source)
    {
       try
       {
          Result<O> r = transform(item);
          if (r.success) yield return r.value;
       }
       catch {}
    }
}

ReadLines().Reduce(l => { var i; new Tuple<bool, int>(int.TryParse(l.Substring(3,3),i), i)} );

I don't really like this, though, as I'm already on the record as not liking using tuples in this way. Unfortunately, I don't see many alternatives outside of abusing exceptions or restricting it to reference types (where null is defined as a failed conversion), neither of which is much better.

Community
  • 1
  • 1
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • I looked at this approach. I just didn't like the fact that the compiler can't infer the type (at least in C# 3), so the "Reduce" extension usability suffers... – Reed Copsey Feb 02 '10 at 20:18
  • My main complaints are **1)** that I can't express the conversion in a single statement. I still need a variable declaration inside the lambda. and **2)** that I have to express the result in form a tuple rather than the converted item. – Joel Coehoorn Feb 02 '10 at 20:22