1

I just started using this year's Advent of Code to learn F# and I immediately stepped on a rake by trying to reuse the IEnumerable from File.ReadLines.

Here are all of the ways I see to solve this:

// Read all lines immediately into array/list
let linesAll     = File.ReadAllLines "file.txt"
let linesArray   = File.ReadLines "file.txt" |> Array.ofSeq
let linesList    = File.ReadLines "file.txt" |> List.ofSeq

// Lazily load and cache for replays
let linesCache   = File.ReadLines "file.txt" |> Seq.cache

// Start new filesystem read for every replay
let linesDelay   = (fun () -> File.ReadLines "file.txt") |> Seq.delay
let linesSeqExpr = seq { yield! File.ReadLines "file.txt" }
  • Are these all semantically identical (for a read-only file)?
  • Are linesDelay and linesSeqExpr the only ones that don't read the entire file into memory?
  • Is linesList slowed down by having to assemble the list backwards?
  • Are any of these considered more or less idiomatic?

Edit

Here is code that reproduces my issue:

let lines = System.IO.File.ReadLines("alphabet.txt")
for i = 0 to 5 do
  let arr = Seq.zip lines (Seq.skip 1 lines) |> Array.ofSeq
  printfn "%A %A" i arr

gives output:

0 [|("A", "C"); ("D", "E"); ("F", "G"); ("H", "I"); ("J", "K"); ("L", "M");
  ("N", "O"); ("P", "Q"); ("R", "S"); ("T", "U"); ("V", "W"); ("X", "Y")|]
1 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
  ("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
  ("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
  ("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
  ("Y", "Z")|]
2 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
  ("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
  ("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
  ("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
  ("Y", "Z")|]
3 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
  ("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
  ("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
  ("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
  ("Y", "Z")|]
4 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
  ("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
  ("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
  ("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
  ("Y", "Z")|]
5 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
  ("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
  ("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
  ("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
  ("Y", "Z")|]

Looks like Seq.zip lines (Seq.skip 1 lines) expression is triggering a bug by doing two enumerations at the same time.

Edit 2

Reproduction in C#. Slightly different order because I'm not skipping one on the right side.

var lines = File.ReadLines("alphabet.txt");
for (int i = 0; i < 5; i++)
{
    var zipped = new List<(string, string)>();
    var enum1 = lines.GetEnumerator();
    var enum2 = lines.GetEnumerator();
    while (enum1.MoveNext() && enum2.MoveNext())
    {
        zipped.Add((enum1.Current, enum2.Current));
    }
    Console.WriteLine($"{i} [{string.Join(',', zipped)}]");
}
0 [(A, B),(C, D),(E, F),(G, H),(I, J),(K, L),(M, N),(O, P),(Q, R),(S, T),(U, V),(W, X),(Y, Z)]
1 [(A, A),(B, B),(C, C),(D, D),(E, E),(F, F),(G, G),(H, H),(I, I),(J, J),(K, K),(L, L),(M, M),(N, N),(O, O),(P, P),(Q, Q),(R, R),(S, S),(T, T),(U, U),(V, V),(W, W),(X, X),(Y, Y),(Z, Z)]
2 [(A, A),(B, B),(C, C),(D, D),(E, E),(F, F),(G, G),(H, H),(I, I),(J, J),(K, K),(L, L),(M, M),(N, N),(O, O),(P, P),(Q, Q),(R, R),(S, S),(T, T),(U, U),(V, V),(W, W),(X, X),(Y, Y),(Z, Z)]
3 [(A, A),(B, B),(C, C),(D, D),(E, E),(F, F),(G, G),(H, H),(I, I),(J, J),(K, K),(L, L),(M, M),(N, N),(O, O),(P, P),(Q, Q),(R, R),(S, S),(T, T),(U, U),(V, V),(W, W),(X, X),(Y, Y),(Z, Z)]
4 [(A, A),(B, B),(C, C),(D, D),(E, E),(F, F),(G, G),(H, H),(I, I),(J, J),(K, K),(L, L),(M, M),(N, N),(O, O),(P, P),(Q, Q),(R, R),(S, S),(T, T),(U, U),(V, V),(W, W),(X, X),(Y, Y),(Z, Z)]

Edit 3

This is a known issue and will not be fixed to keep compatibility.

    //  - IEnumerator<T> instances from the same IEnumerable<T> party on the same underlying
    //    reader.
Kenneth Allen
  • 347
  • 3
  • 7

1 Answers1

2

What problem did you have by reusing the sequence from File.ReadLines? The following code works fine for me:

let lines = File.ReadLines "file.txt"
for line in lines do printfn "%s" line
for line in lines do printfn "%s" line

Anyway, here's my take on the answers to your questions:

  • Are these all semantically identical (for a read-only file)?

They're similar, but not identical, because they have different types. E.g. An array and a list don't have exactly the same semantics. (Also, keep in mind that even a read-only file, can be deleted, which will affect the lazy versions.)

  • Are linesDelay and linesSeqExpr the only ones that don't read the entire file into memory?

No, linesCache should also only read as many lines as are needed.

  • Is linesList slowed down by having to assemble the list backwards?

I don't think so. See source of List.ofSeq primitive here.

  • Are any of these considered more or less idiomatic?

I think they're all fine, depending on the circumstance. Personally, I often just use File.ReadAllLines unless I have reason to believe the file is huge.

Brian Berns
  • 15,499
  • 2
  • 30
  • 40
  • Thank you for the info! Inconsistently, it would (without throwing) only give me part of the file. I saw [this](http://www.fssnip.net/3C/title/Restartable-FileReadLines) which showed me I wasn't the first one to run into this problem. I'll see if I can get a consistent reproduction. – Kenneth Allen Dec 02 '21 at 04:33
  • 1
    Added code the reproduces the issue I saw. – Kenneth Allen Dec 04 '21 at 06:22
  • Apparently it is a known issue and will not be fixed. – Kenneth Allen Dec 04 '21 at 06:51