I am writing a C# method that streams a large number of rows from a SQL query (not a straight DBSet<T>
!), performs some transformation on them, then writes the results into a MongoDB database. I am trying to make this run as quickly as possible, and due to reasonably high network latency I want to avoid making multiple return trips to SQL Server.
I have a class, StreamlinedCrmTicket
, that represents a DTO onto which EF projects the result of a raw SQL query that takes no parameterised inputs. I am using EF Core 3.1.6 and the .Set<StreamlinedCrmTicket>
technique to execute the raw SQL query. I'm then using .AsNoTracking()
for the performance gains in light of this only ever being a read operation. Finally, I am calling .AsAsyncEnumerable()
, and wrapping the whole shebang in an await foreach
which, in turn, lives inside a method marked async
.
The whole thing looks like this:
await foreach (var ticket in _affinityContext.Set<StreamlinedCrmTicket>().FromSqlRaw(query).AsNoTracking().AsAsyncEnumerable().WithCancellation(cancellationToken))
{
// Do something with each ticket.
}
The source table for my raw SQL query currently contains around 1.2 million rows. There are a handful of joins that seem to cause very little change in the execution time of the query when measured using SSMS.
When I execute my code, it appears that EF starts the query, but the body of the foreach loop, no matter what it contains, does not begin executing until the entire query has been executed and the resultset received from SQL Server. This defeats the object of my use of IAsyncEnumerable! My understanding was that IAsyncEnumerable should allow me to act on rows (or entities) as they return from the database without having to wait for the entire resultset.
Some ideas that support my theory that this is not behaving asynchronously at the moment are:
- As soon as the call to
_affinityContext.Set<StreamlinedCrmTicket>().FromSqlRaw(query).AsNoTracking().AsAsyncEnumerable().WithCancellation(cancellationToken)
is complete, a large amount of network IO begins. I can see in Performance Monitor on my Windows machine that the IO is a SQL Server connection to the server my code should be running against. - I swapped the body of my
foreach
loop to a very simple one, and it only runs once the network IO has stopped. - I removed all
ORDER BY
clauses from the SQL query - row ordering is irrelevant in this use case, and I was concerned that this might be causing the query to take a long time before the first row was returned thus giving the illusion of running synchronously. However, the network IO suggests this wasn't (and isn't - I left the clause out!) the case. - If I add a
TOP 1000
to theSELECT
statement in my query, it executes much more rapidly.
I'm unsure why this is running synchronously, and documentation on Microsoft's site seems poor!