5

I've got a query that returns more than 212Milion rows. I tried to use .ToList() but it obviously goes in OutOfMemoryException. What is the best alternative for my case?

I tried also to use Skip(fetchedrows).Take(1000).ToList() but performance really slow down on skipping phase and query never end.

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
Oper
  • 339
  • 3
  • 16
  • Can you filter the query prior to getting 212 million rows? – Casey O'Brien Feb 27 '17 at 19:34
  • http://stackoverflow.com/questions/419019/split-list-into-sublists-with-linq Can split into the sublists work for you? – tRuEsAtM Feb 27 '17 at 19:35
  • If you really have to change 212 millions data, consider making an SQL stored procedure that does it instead, it can't be faster than SQL – Antoine Pelletier Feb 27 '17 at 19:36
  • 1
    @AntoinePelletier It depends on what you are doing. Some actions just aren't possible from stored procedures, but generally that is a good option indeed. – Patrick Hofman Feb 27 '17 at 19:40
  • @PatrickHofman yes, Oper you should tell us what kind of manipulations you are doing, you take these lines to accomplish something, what is it? – Antoine Pelletier Feb 27 '17 at 19:44
  • Actually, i don't do anything with this data, This is for a project of performance comparison between SQL Server and MongoDB. In MongoDb i can do this query in 8 hours storing the results in another colleciton. But in SQL Server i don't know how to retrieve this data without going in MemoryException. I'm also thinking about insert this rows in a new table, but i don't know a way to do that (going from var x = (linqtosql statement) to db.QueryResult.AddRange(x); – Oper Feb 27 '17 at 19:54
  • 1
    You are testing the limits... you reached it ! Seriously, consider Patrick's answer or... admit there are limitations to what a computer can do... good luck with your furtur tests. – Antoine Pelletier Feb 27 '17 at 20:07
  • Why don't you use a smaller data set? You don't need these large amounts to reveal performance differences. Also, how useful is the comparison? SQL- and NoSQL databases serve different goals. It's a bit like comparing a Landrover to a Lamborgini. Depends on the landscape who wins. – Gert Arnold Feb 27 '17 at 23:02
  • Dataset is not so big, this is the result of a query, not the size of the dataset. – Oper Feb 27 '17 at 23:08
  • Whatever. Why 212 million? One million will suffice to demonstrate that MongoDb will easily beat SS in tasks it's geared to doing fast. – Gert Arnold Feb 27 '17 at 23:41

1 Answers1

3

If possible, return an enumerable. That way you can keep on scrolling forward (and forward only) through the excessive amounts of rows without the need of placing it in memory at once, which will most likely never work. If implemented correctly (it also depends on your data source), you can read and process one row at a time, with virtually no memory in use.

There are many techniques to do this, but the easiest way I often use is yield return, which will generate its own state machine.

Entity Framework nowadays does stream the results itself. The only thing you should not do is call ToList() or similar methods which would load all rows from the database in memory. Just iterate over the results as if they were regular collection. (For EF6 and older you can use the AsStreaming() extension method.) If that doesn't work for you, you can always revert to a DataReader where you read and return row by row.

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
  • It doesn't work pretty good because when we use yield, we will trigger a request to the database getting the same amount data to the memory. – Edney Batista da Silva Feb 27 '17 at 19:37
  • No, using `DataReader` you just read one row at a time. The data provider you use might send more rows at once, but often you can control that too. @EdneyBatistadaSilva – Patrick Hofman Feb 27 '17 at 19:38
  • 1
    @PatrickHofman But he's *not* using a `DataReader`, he's using EF. – Servy Feb 27 '17 at 19:41
  • As you said, the datareader do that, but in this case yield statement forces the EF compile and trigger the query to make the IEnumerable to return on the instruction, in the same way, it brings all the rows. – Edney Batista da Silva Feb 27 '17 at 19:42
  • @Servy That is what I said. Updated to use EntityFramework's `AsStreaming()` which does about the same. – Patrick Hofman Feb 27 '17 at 19:44
  • AsStreaming() extension method is obsolete. Am i missing something? how can i use this from linqtosql? Can you be more precise? – Oper Feb 27 '17 at 20:05
  • 1
    I see, that was EF6. The current version is streaming by default, so if you don't call `ToList()` on it, it will not retrieve all data at once. – Patrick Hofman Feb 27 '17 at 20:06
  • is it possible with this method see how mutch time query execution take? Even if the results are not taken all in one? – Oper Feb 27 '17 at 20:13