-2

I want to get all the records from the database using EF Core and then save them to another database. I have 149541 rows in the database and I couldn't fetch them all at one time because it is very slow (maybe more than one hour) so I decided to do some Skip and Take. I did it in a loop where I icrease skip number every iteration but for some reason this loop never ended.

then I did some logging and saw that skip variable became zero.

            var start = DateTime.UtcNow;

            int skipAmount;
            const int step = 500;
            var parallelTasksBatchSize = step / 10;

            var totalEmailMessagesCount = await _context.Set<EmailMessage>().CountAsync();
            
            _errorLoggerQueue.LogError(new InformationLog($"Operation started. email messages count: {totalEmailMessagesCount}"), operationName);

            var client = new MyClient(_restClient);
            
            for (skipAmount = 0; skipAmount < totalEmailMessagesCount; skipAmount += step)
            {
                var emailMessages = await _context.Set<EmailMessage>()
                    .Include(x => x.EmailSubTemplates)
                    .OrderBy(x => x.CreatedAt)
                    .Skip(skipAmount)
                    .Take(step)
                    .AsNoTracking()
                    .ToListAsync();

                var tasks = new List<Task<RestResponse>>();
                
                foreach (var emailMessage in emailMessages)
                {
                    var emailMessageLogDto = new EmailMessageLogDto
                    {
                        //not important what properties I have there.
                        //and it's log of email that had been sent  from this application
                        //it's not a log of something that happening in this script
                        //I'm just moving those email messages from ome DB to another
                    };
                    
                    tasks.Add(client.CreateEmailMessageLogAsync(emailMessageLogDto));

                    if (tasks.Count == parallelTasksBatchSize)
                    {
                        var responses = await Task.WhenAll(tasks);
                        
                        tasks.Clear();
                        
                        var contentsOfFailedResponses =
                            responses.Where(x => !x.IsSuccessful).Select(x => x.Content).ToList();

                        var failFeedback = contentsOfFailedResponses.Count == 0
                            ? string.Empty
                            : string.Join(" |_| ", contentsOfFailedResponses);

                        if (!string.IsNullOrEmpty(failFeedback))
                        {
                            _errorLoggerQueue.LogError(new InformationLog($"timeNow: {DateTime.UtcNow.ToShortTimeString()}, failFeedback: {failFeedback}"), operationName);
                        }
                    }
                }
                
                _errorLoggerQueue.LogError(new InformationLog($"skip: {skipAmount}, step: {step}, totalCount: {totalEmailMessagesCount}, number of emailMessages = {emailMessages.Count}"), operationName);
            }

            _errorLoggerQueue.LogError(new InformationLog($"start time: {start}, finish time: {DateTime.UtcNow}, finish time - start time = {DateTime.UtcNow - start}"), operationName);

and it's working ok but in one moment it did some really weird thing. so the loop never ends. see the screenshot of the logs

Screenshot of the logs

And here is the calling code (.InitAsync())

namespace SuperNameSpace
{
    public class Program
    {
        public static async Task Main(string[] args)
        {
            CultureInfo cultureInfo = new CultureInfo("en-US");
            cultureInfo.NumberFormat.CurrencySymbol = "$";
            CultureInfo.DefaultThreadCurrentCulture = cultureInfo;
            CultureInfo.DefaultThreadCurrentUICulture = cultureInfo;
             
            IWebHost webHost = CreateWebHostBuilder(args).Build();
            await webHost.Services.GetRequiredService<IInitializationService>().InitAsync();

            webHost.Run();
        }

        public static IWebHostBuilder CreateWebHostBuilder(string[] args)
        {
            return WebHost.CreateDefaultBuilder(args).UseStartup<Startup>();
        }
    }
}

Application and DB is deployed on azure

I assume that Azure app service reruns Main method after some timeout because web application is not actually running at the moment of initialization. Can I somehow prevent it from happening? Or is there any better approach for running some kind of initialization scripts?

For now I put this code in a simple command (This application uses CQRS) and triggered it with Postman. Command has been complete in 47 minutes and it worked fine. everything was moved without failed responses

Alex
  • 21
  • 1
  • 5
  • 2
    You should show the code that *actually* produced this log. The log shows a decreasing skip amount. That doesn't follow from the code you've shown here. – Mark Benningfield Nov 13 '22 at 22:21
  • Whatever code is calling the code you've posted is probably calling this code repeatedly, I don't think the issue is with your loop but rather that it's being kicked off more than once. The code is still attempting to load all rows into memory rather than copying data 50 records at a time. – Steve Py Nov 14 '22 at 01:07
  • @MarkBenningfield , Steve Py , Guys, I updated the question. added calling code in it, edited script itself (I used this one from the beginning but for the sake of brevity on a first version of this question I put the code where I just get records from db and put in in a list). Do you know is there any better way to run initialization scripts rather than in Main() method? – Alex Nov 14 '22 at 11:56

1 Answers1

0

Like Steve Py says, what's the point of bringing back 50 at a time if you add them to a list in memory. I think you need to deal with them inside the loop. Which way are you reading your log? I'm guessing the log is set to prepend, with newest records a the top? Could something be restarting your program if it crashes? At a very minimum, you should have a try/catch, likely inside the loop, to catch and log errors. For me, the most likely scenario is that your app is crashing with a memory error when your list gets too big, and something is restarting it.

These are all the beginner pitfalls that mean that it's generally a bad idea to do this kind of task in code. ETL (Extract, Transfer and Loading) tools are much more adapted. I wouldn't hesitate to start aggain in SSIS.

bbsimonbb
  • 27,056
  • 15
  • 80
  • 110
  • I updated the question. added calling code in it, edited script itself (I used this one from the beginning but for the sake of brevity on a first version of this question I put the code where I just get records from db and put in in a list). Do you know is there any better way to run initialization scripts rather than in Main() method? I did it in code because it's not just copying I also initialize values of a new columns. and I want this script to run in differant environments. and every environment has own Database – Alex Nov 14 '22 at 11:57