0

I have some tasks (nWorkers = 3):

var taskFactory = new TaskFactory(cancellationTokenSource.Token,
    TaskCreationOptions.LongRunning, TaskContinuationOptions.LongRunning,
    TaskScheduler.Default);

for (int i = 0; i < nWorkers; i++)
{
    var task = taskFactory.StartNew(() => this.WorkerMethod(parserItems,
        cancellationTokenSource));
    tasks[i] = task;
}    

And the following method called by the tasks:

protected override void WorkerMethod(BlockingCollection<ParserItem> parserItems,
    CancellationTokenSource cancellationTokenSource)
{
//...log-1...
using (var connection = new OracleConnection(connectionString))
{
    OracleTransaction transaction = null;
    try
    {
        cancellationTokenSource.Token.ThrowIfCancellationRequested();
        connection.Open();
        //...log-2...
        transaction = connection.BeginTransaction();
        //...log-3...
        using (var cmd = connection.CreateCommand())
        {                   
            foreach (var parserItem in parserItems.GetConsumingEnumerable(
                cancellationTokenSource.Token))
            {
                cancellationTokenSource.Token.ThrowIfCancellationRequested();
                try
                {
                    foreach (var statement in this.ProcessRecord(parserItem))
                    {                               
                        cmd.CommandText = statement;
                        try
                        {                                   
                            cmd.ExecuteNonQuery();                                  
                        }
                        catch (OracleException ex)
                        {
                            //...log-4...
                            if (!this.acceptedErrorCodes.Contains(ex.Number))
                            {
                                throw;
                            }
                        }
                    }
                }
                catch (FormatException ex)
                {
                    log.Warn(ex.Message);
                }
            }
            if (!cancellationTokenSource.Token.IsCancellationRequested)
            {                       
                transaction.Commit();                       
            }
            else
            {                       
                throw new Exception("DBComponent has been canceled");
            }
        }
    }
    catch (Exception ex)
    {
        //...log-5...
        cancellationTokenSource.Cancel();
        if (transaction != null)
        {
            try
            {
                transaction.Rollback();
                //...log-6...
            }
            catch (Exception rollbackException)
            {
                //...log-7...
            }
        }
        throw;
    }
    finally
    {
        if (transaction != null)
        {
            transaction.Dispose();
        }
        connection.Close();
        //...log-8...
    }
}
//...log-9...   
}

There is a producer of ParserItem objects and these are the consumers. Normally it works fine, there are sometimes that there is an Oracle connection timeout, but in these cases I can see the exception message and everything works as designed.

But sometimes the process get stuck. When it gets stuck, in the log file I can see log-1 message and after that (more or less 15 seconds later) I see log-8 message, but what is driving me nuts is why i cannot see neither the exception message log-5 nor the log-9 message. Since the cancellationTokenSource.Cancel() method is never called, the producer of items for the bounded collection is stuck until a timeout two hours later.

It is compiled for NET Framework 4 and I'm using Oracle.ManagedDataAccess libraries for the Oracle connection.

Any help would be greatly appreciated.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Mario Arturo
  • 347
  • 3
  • 9
  • Any chance you are simply running into database side deadlocks? I'd recommend to use hints telling ORACLE your preferred locking strategy for your data reads and updates. – Darek Dec 30 '14 at 18:37
  • You need to take a hang dump. I recommend using [procdump](http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx). Then you can load the dump into WinDBG with the SOS.dll and start looking around. There are many great tutorials out there in how to get started. – Yuval Itzchakov Dec 30 '14 at 19:31
  • @Darek I understand that in case of database side deadlocks, I should catch and see an Oracle Exception instead of just see the execution of the finally clause. It's that right? or am I missing something – Mario Arturo Dec 30 '14 at 20:03
  • you might have been running into some in memory deadlock, because in your code you were canceling from the catch block, which itself throws an exception. Bottom, line, bad code. Try to cleanup, and investigate again. – Darek Jan 05 '15 at 18:28

2 Answers2

1

You should never dispose a transaction or connection when you use using scope. Second, you should rarely rely on exception based programming style. Your code rewritten below:

using (var connection = new OracleConnection(connectionString))
{
    using (var transaction = connection.BeginTransaction())
    {
        connection.Open();
        //...log-2...
        using (var cmd = connection.CreateCommand())
        {
            foreach (var parserItem in parserItems.GetConsumingEnumerable(cancellationTokenSource.Token))
            {
                if (!cancellationTokenSource.IsCancellationRequested)
                {
                    try
                    {
                        foreach (var statement in ProcessRecord(parserItem))
                        {
                            cmd.CommandText = statement;
                            try
                            {
                                cmd.ExecuteNonQuery();
                            }
                            catch (OracleException ex)
                            {
                                //...log-4...
                                if (!acceptedErrorCodes.Contains(ex.ErrorCode))
                                {
                                    log.Warn(ex.Message);
                                }
                            }
                        }
                    }
                    catch (FormatException ex)
                    {
                        log.Warn(ex.Message);
                    }
                }
            }
            if (!cancellationTokenSource.IsCancellationRequested)
            {
                transaction.Commit();
            }
            else
            {
                transaction.Rollback();
                throw new Exception("DBComponent has been canceled");
            }
        }
    }
}
//...log-9... 

Let me know if this helps.

Darek
  • 4,687
  • 31
  • 47
  • I refactored the code, but since I cannot reproduce this issue in my computer it take some time to validate if the error is gone. As soon as I have confirmation I will select your answer as accepted. Meanwhile, thanks a lot for your help :) – Mario Arturo Jan 05 '15 at 20:19
1

I can confirm everything you're saying. (program stuck, low CPU usage, oracle connection timeouts, etc.)

One workaround is to use Threads instead of Tasks.

UPDATE: after careful investigation I found out that when you use a high number of Tasks, the ThreadPool worker threads queued by the Oracle driver become slow to start, which ends up causing a (fake) connect timeout.

A couple of solutions for this:

Solution 1: Increase the ThreadPool's minimum number of threads, e.g.:

ThreadPool.SetMinThreads(50, 50);  // YMMV

OR

Solution 2: Configure your connection to use pooling and set its minimum size appropriately.

var ocsb = new OracleConnectionStringBuilder();
ocsb.DataSource = ocsb.DataSource;
ocsb.UserID = "myuser";
ocsb.Password = "secret";
ocsb.Pooling = true;
ocsb.MinPoolSize = 20; // YMMV

IMPORTANT: before calling any routine that creates a high number of tasks, open a single connection using that will "warm-up" the pool:

using(var oc = new OracleConnection(ocsb.ToString()))
{
    oc.Open();
    oc.Close();
}

Note: Oracle indexes the connection pools by the connect string (with the password removed), so if you want to open additional connections you must use always the same exact connect string.

Hugo Leote
  • 41
  • 4