0

I am calling this action (ASP.Net Core 2.0) over AJAX:

[HttpGet]
public async Task<IActionResult> GetPostsOfUser(Guid userId, Guid? categoryId)
{
    var posts = await postService.GetPostsOfUserAsync(userId, categoryId);

    var postVMs = await Task.WhenAll(
        posts.Select(async p => new PostViewModel
        {
            PostId = p.Id,
            PostContent = p.Content,
            PostTitle = p.Title,
            WriterAvatarUri = fileService.GetFileUri(p.Writer.Profile.AvatarId, Url),
            WriterFullName = p.Writer.Profile.FullName,
            WriterId = p.WriterId,
            Liked = await postService.IsPostLikedByUserAsync(p.Id, UserId),// TODO this takes too long!!!!
        }));

    return Json(postVMs);
}

But it took too long to response (20 seconds!!!) in case I have many post objects in posts array (e.g. 30 posts).
That is caused because of this line await postService.IsPostLikedByUserAsync.

Digging into the source code of this function:

public async Task<bool> IsPostLikedByUserAsync(Guid postId, Guid userId)
{
    logger.LogDebug("Place 0 passed!");

    var user = await dbContext.Users
        .SingleOrDefaultAsync(u => u.Id == userId);

    logger.LogDebug("Place 1 passed!");

    var post = await dbContext.Posts
        .SingleOrDefaultAsync(u => u.Id == postId);

    logger.LogDebug("Place 2 passed!");

    if (user == null || post == null)
        return false;

    return post.PostLikes.SingleOrDefault(pl => pl.UserId == userId) != null;
}

The investigations showed, after some seconds, ALL "Place 1 passed!" logging methods are executed together for every post object. In other words, it seems that every post awaits until the previous post finishes executing this part:

 var user = await dbContext.Users
        .Include(u => u.PostLikes)
        .SingleOrDefaultAsync(u => u.Id == userId);

And then -when every post finishes that part- the place 1 of log is executed for all post objects.

The same happens for logging place 2, every single post seems to await for the previous post to finish executing var post = await dbContext.Pos..., and then the function can go further to execute log place 2 (after few seconds from log 1, ALL log 2 appear together).

That means I have no asynchronous execution here. Could some one help me to understand and solve this problem?

UPDATE:

Changing the code a bit to look like this:

    /// <summary>
    /// Returns all post of a user in a specific category.
    /// If the category is null, then all of that user posts will be returned from all categories
    /// </summary>
    /// <param name="userId"></param>
    /// <param name="categoryId"></param>
    /// <returns></returns>
    [Authorize]
    [HttpGet]
    public async Task<IActionResult> GetPostsOfUser(Guid userId, Guid? categoryId)
    {
        var posts = await postService.GetPostsOfUserAsync(userId, categoryId);
        var i = 0;
        var j = 0;
        var postVMs = await Task.WhenAll(
            posts.Select(async p =>
            {
                logger.LogDebug("DEBUG NUMBER HERE BEFORE RETURN: {0}", i++);
                var isLiked = await postService.IsPostLikedByUserAsync(p.Id, UserId);// TODO this takes too long!!!!
                logger.LogDebug("DEBUG NUMBER HERE AFTER RETURN: {0}", j++);
                return new PostViewModel
                {
                    PostId = p.Id,
                    PostContent = p.Content,
                    PostTitle = p.Title,
                    WriterAvatarUri = fileService.GetFileUri(p.Writer.Profile.AvatarId, Url),
                    WriterFullName = p.Writer.Profile.FullName,
                    WriterId = p.WriterId,
                    Liked = isLiked,
                };
            }));

        return Json(postVMs);
    }

That shows, that this line "DEBUG NUMBER HERE AFTER RETURN" is printed for ALL select methods together, that means that ALL select methods waits for each other before going further, how can I prevent that?

UPDATE 2

Substituting the previous IsPostLikedByUserAsyncmethod, with the following one:

public async Task<bool> IsPostLikedByUserAsync(Guid postId, Guid userId)
{
    await Task.Delay(1000);
}

Showed no problem in async running, I had to wait only 1 second, not 1 x 30. That means it is something specific to EF.

Why does the problem happen ONLY with entity framework (with the original function)? I notice the problem even with only 3 post objects! Any new ideas?

Mohammed Noureldin
  • 14,913
  • 17
  • 70
  • 99
  • Since your method returns boolean, you can return `return post.PostLikes.Any(pl => pl.UserId == userId);` – koryakinp Apr 11 '18 at 23:50
  • @koryakinp, could you explain? actually that line should not be the problem (as I think). – Mohammed Noureldin Apr 11 '18 at 23:54
  • @MohammedNoureldin hope this [link](https://stackoverflow.com/questions/35011656/async-await-in-linq-select) comes in handy. Maybe im overlooking but I dont see any errors. You could just incase split the select tasks to a seperate var and after call the WhenAll. eventough it should not matter but maybe it reacts different. I will delete my post sorry I misunderstood your question before – H.Mikhaeljan Apr 12 '18 at 00:08
  • @H.Mikhaeljan, could you suggest the editing in an answer? – Mohammed Noureldin Apr 12 '18 at 00:11
  • Can't you simply create one database query that already returns all neccessary data (*Post*s and *PostLikes*) together? It should be simple `join` of *Posts*, *Users* and *PostLikes* tables. Issuing new database query for each post is performance killer and async code cannot make it much better. – Ňuf Apr 12 '18 at 00:30
  • @Ňuf that can't be performance issue, processor can absolutely handle more than 2 queries per second. However, could you please show me in an answer your suggestion? – Mohammed Noureldin Apr 12 '18 at 01:19
  • Have you tested the performance of your queries individually? Also, is your repository designed to handle multiple requests at once, or does it process them synchronously and in order? – cwharris Apr 12 '18 at 01:40
  • It looks like your database is slow when attempting to handle 30 queries all at once 3 times over, and that it happens to be returning all of the results at the same time. – cwharris Apr 12 '18 at 01:46
  • 1
    @MohammedNoureldin: I did not suggest that it is a performance issue. Instead I wanted to indicate that it actually might be an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/66378). Instead writing complicated async code that is querying *PostLikes* for individual *Post*s (which is terribly inefficient), you can completely avoid your problem by gathering all data in one single query (if possible). Not only such solution should use database more efficiently, it should also be simpler to write such code. – Ňuf Apr 12 '18 at 03:37
  • 1
    If you are using *entity framework*, you don't event need to 'join' tables yourself, EF will do it automatically with query like this: `from Post p in dbContext.Posts where p.UserId == userId select new PostViewModel() { PostContent = p.Content, IsLiked = p.PostLikes.Any(v => v.UserId == UserId), ... }` – Ňuf Apr 12 '18 at 03:38
  • @Ňuf, thank you, it is actually about understanding the exact problem, not only finding a workaround for it. I could not till now understand what the problem is and why is it happening. – Mohammed Noureldin Apr 12 '18 at 16:43
  • @Ňuf To be honest, I am not sure that your explanation is correct, I still think that calling `async` lambda in any function will not await for every single one of them, but will return a task of it which should continue working in the background. If I am wrong please give me any possible solution for the problem? – Mohammed Noureldin Apr 12 '18 at 17:58
  • @MohammedNoureldin: I understand that you asked the question to educate yourself and are not looking for an workaroud, that't why I wrote it as a comment, not an answer. It is correct that several async queries should be able to run at once in the background. But issuing potentially hundreds of queries at once is not a good idea (database would not be able to process them at once anyway) and it's likely that some lower layer (EF, ADO.NET provider or network connection manager) will limit number of concurrent queries. – Ňuf Apr 13 '18 at 09:00
  • And no matter how successffull you will be in fixig your async code, you will never get even close to the performance of single query. Not only one query can be performed by database much more efficiently than bunch of (simple) queries, but it also avoids many webserver->database->webserver round-trips (each adding latency) and will scale much better as number of posts will start to grow further. So i believe it is something you should consider in realword scenario. BTW IIRC Entity Framework doesn't support concurrent querries on a single DbContext-are you sure you don't use it in such a way? – Ňuf Apr 13 '18 at 09:01

2 Answers2

0

The deductions you've made are not necessarily true.

If these methods were firing in a non-asynchronous fashion, you would see all of the logs from one method invocation reach the console before the next method invocation's console logs. You would see the pattern 123123123 instead of 111222333. What you are seeing is that the three awaits seem to synchronize after some asynchronous batching occurs. Thus it appears that the operations are made in stages. But why?

There are a couple reasons this might happen. Firstly, the scheduler may be scheduling all of your tasks to the same thread, causing each task to be queued and then processed when the previous execution flow is complete. Since Task.WhenAll is awaited outside of the Select loop, all synchronous portions of your async methods are executed before any one Task is awaited, therefore causing all of the "first" log invocations to be called immediately following the invocation of that method.

So then what's the deal with the others syncing up later? The same thing is happening. Once all of your methods hit their first await, the execution flow is yielded to whatever code invoked that method. In this case, that is your Select statement. Behind the scenes, however, all of those async operations are processing. This creates a race condition.

Shouldn't there be some chance of the third log of some methods being called before the second log of another method, due to varying request/response times? Most of the time, yes. Except you've introduced a sort of "delay" in to the equation, making the race condition more predictable. Console logging is actually quite slow, and is also synchronous. This causes all of your methods to block at the logging line until the previous logs have completed. But blocking, by itself, may not be enough to make all of those log calls sync up in pretty little batches. There may be another factor at play.

It would appear that you are querying a database. Since this is an IO operation, it takes considerably longer to complete than other operations (including console logging, probably). This means that, although the queries aren't synchronous, they will in all likelihood receive a response after all of the queries/requests have already been sent, and therefore after the second log line from each method has already executed. The remaining log lines are processed eventually, and therefore fall in to the last batch.

Your code is being processed asynchronously. It just doesn't look quite how you might expect. Async doesn't mean random order. It just means some code flow is paused until a later condition is met, allowing other code to be processed in the mean time. If the conditions happen to sync up, then so does your code flow.

cwharris
  • 17,835
  • 4
  • 44
  • 64
0

Actually async execution works, but it doesn't work as you expect. Select statement starts tasks for all posts and then they all work concurrently that leads you to performance problems you.

The best approach to achieve expected behavior is to reduce the degree of parallelism. There are no build-in tools to do that so I can offer 2 workarounds:

  1. Use TPL DataFlow library. It is developed by Microsoft but not very popular. You can easily find enough examples though.

  2. Manage parallel tasks by yourself with SemaphoreSlim. It would look like this:

    semaphore = new SemaphoreSlim(degreeOfParallelism);
    cts = new CancellationTokenSource();
    var postVMs = await Task.WhenAll(
    posts.Select(async p => 
    {
        await semaphore.WaitAsync(cts.Token).ConfigureAwait(false);
        cts.Token.ThrowIfCancellationRequested();
        new PostViewModel
        {
            PostId = p.Id,
            PostContent = p.Content,
            PostTitle = p.Title,
            WriterAvatarUri = fileService.GetFileUri(p.Writer.Profile.AvatarId, Url),
            WriterFullName = p.Writer.Profile.FullName,
            WriterId = p.WriterId,
            Liked = await postService.IsPostLikedByUserAsync(p.Id, UserId),// TODO this takes too long!!!!
        }
        semaphore.Release();
    }));
    

And don't forget to use .ConfigureAwait(false) whenever it's possible.

Andrey Alonzov
  • 350
  • 2
  • 9
  • It cannot be the degree of parallelism, I can notice that 3 `post`s takes longer than 1 `post`. Can't it handle 3 parallel queries? That can't be true – Mohammed Noureldin Apr 13 '18 at 12:43