In the past, I have written queries that just get the entirety of the query, store the results in memory, then feed the whote sequence to a .CSV
type provider. A query example:
let results =
query {
for row in db.ThisRow do
select row
}
|> Seq.toList
As I understand it, the Seq.toList
part forces the query to run while keeping the output from the query itself as a Seq
would be lazy.
If the number of the query results are small, this isn't a big deal. If, however, the number of results is large (e.g., >1Mil rows), I end up with a System.OutOfMemoryException
. As a result, a friend suggested in look into using the Nessos.Stream
library.
My goal is to pull a row from the query, do some operations to that row, then write that single row to a .CSV
and do that iteratively for each row with all rows ultimately in the same .CSV
file.
So, I try
open Nessos.Stream
type StringInt = {
String: string option
Int: int
}
type StringIntCsvType = CsvProvider<Sample = "item_number, num",
Schema = "item_number (string option), num (int)",
HasHeaders = true>
let buildRowFromObject (obj: StringInt) = StringIntCsvType.Row(obj.String,
obj.Int)
let results =
query {
for row in db.ThisRow do
select row
}
|> Stream.ofSeq
|> Stream.groupBy (fun row -> row.ITEMNUMBER)
|> Stream.map (fun (itemString, seq) -> (itemString, (seq |> Seq.length)))
|> Stream.map (fun (str, num) -> {String = Some str;
Int = num})
|> Stream.map buildRowFromObject
|> Stream.toSeq
let ThisCsv= new StringIntCsvType(results)
let ThisCsvLoc = "pathToFileLocation"
let ThisCsv.Save(ThisCsvLoc)
Although this works inasmuch as I'm getting all of the rows I've queried from the database, this still isn't doing what I want it do and I'm not sure how to make that happen. Also, for larger queries, I still end up with an System.OutOfMemoryException
.
I think I should be able to insert
|> Stream.map (fun x -> new StringIntCsvType(x))
under the |> Stream.map buildRowFromObject
line, but the x
is errored out with
Type Constraint Mismatch. The type
CsvProvider<...>.Row
is not compatible with type
Collections.Generic.IEnumerable<CsvProvider<...>.Row>
Even if I wasn't given this error, I think adding that line would create a new .CSV
for every row, and that is decidedly NOT want I want.
How I can write a query, operate on each different piece of the query, and write each row of the query into the same .CSV
file? Am I even close with the code above?