0

We have a large Django project with approximately 10,000 Django+Nose unit tests. We very rarely use transactions in our codebase. Probably 99% of our code base does not use transactions. The project is on Django 1.5.8 and Django Nose 1.4.1. (Yes, I know this is very old. We're currently 18 months in on a project to update to Django 1.6, but it has not been completed yet. So, if the solution to my problem is "upgrade Django," I'm going to need a way to patch it, because this problem is happening now and it will be months more before we can finish upgrading Django.)

We encountered a new error today that we've never seen before. We added a new database (and requisite DATABASES['geo'] settings) that holds a large, static dataset that the application does not update. It is a read-only database that happens to live in MySQL. Like it does with all of our other databases, Django Nose started creating a test copy of the new database (and destroying said test database) at the start (end) of every test run. This caused numerous problems, including disk space problems and time-wasted problems, but the tests did run and pass.

To solve this problem, we added 'TEST_MIRROR': 'geo' to the DATABASES['geo'] settings. And that's where this headache started. Just that change resulted in a tiny, random portion of our test cases failing each test run:

<nose.suite.ContextSuite context=TestFacebookApiVersion>:setup
<nose.suite.ContextSuite context=RegisterPageTests>:setup
<nose.suite.ContextSuite context=CommonCeleryTasks>:setup
<nose.suite.ContextSuite context=CommonCeleryTestTasks>:setup
<nose.suite.ContextSuite context=S3PublishTestCase>:setup
<nose.suite.ContextSuite context=TestCEP>:setup
<nose.suite.ContextSuite context=AdbInvitesJsonTests>:setup

The error and stack trace are identical for every test case:

Transaction managed block ended with pending COMMIT/ROLLBACK
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/my_workspace/my_project/lib/python2.7/site-packages/nose/suite.py", line 209, in run
    self.setUp()
  File "/var/lib/jenkins/workspace/my_workspace/my_project/lib/python2.7/site-packages/nose/suite.py", line 292, in setUp
    self.setupContext(ancestor)
  File "/var/lib/jenkins/workspace/my_workspace /my_project/lib/python2.7/site-packages/nose/suite.py", line 315, in setupContext
    try_run(context, names)
  File "/var/lib/jenkins/workspace/my_workspace/my_project/lib/python2.7/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/var/lib/jenkins/workspace/my_workspace/my_project/lib/python2.7/site-packages/django_nose/testcases.py", line 43, in setUpClass
    if not test.testcases.connections_support_transactions():
  File "/var/lib/jenkins/workspace/my_workspace /my_project/lib/python2.7/site-packages/django/test/testcases.py", line 827, in connections_support_transactions
    for conn in connections.all())
  File "/var/lib/jenkins/workspace/my_workspace/my_project/lib/python2.7/site-packages/django/test/testcases.py", line 827, in <genexpr>
    for conn in connections.all())
  File "/var/lib/jenkins/workspace/my_workspace/my_project/lib/python2.7/site-packages/django/utils/functional.py", line 45, in __get__
    res = instance.__dict__[self.func.__name__] = self.func(instance)
  File "/var/lib/jenkins/workspace/my_workspace/my_project/lib/python2.7/site-packages/django/db/backends/__init__.py", line 455, in supports_transactions
    self.connection.leave_transaction_management()
  File "/var/lib/jenkins/workspace/my_workspace/my_project/lib/python2.7/site-packages/django/db/backends/__init__.py", line 138, in leave_transaction_management
    "Transaction managed block ended with pending COMMIT/ROLLBACK")

And, worse, the small handful of test cases that fail is different every time. Here are the failures the second time I ran it:

<nose.suite.ContextSuite context=BaseTemplateContainerTests>:setup
<nose.suite.ContextSuite context=MessageServiceNotifierTests>:setup
<nose.suite.ContextSuite context=TestFormatShortAddress>:setup
<nose.suite.ContextSuite context=RevisionableTestCase>:setup
<nose.suite.ContextSuite context=TestSoaHelpers>:setup
<nose.suite.ContextSuite context=TestDateUtils>:setup

And so on.

As you can see from the stack trace, the execution doesn't even make it to our source code. It fails in Django Nose source code, before our test cases even start to execute. And, again, this is only a tiny portion of our tests. The other 9,600+ unit tests all pass with flying colors.

I'm at a loss what to do. I'm not intentionally creating any transactions, and it doesn't make sense to me that adding 'TEST_MIRROR': 'geo' to the DATABASES['geo'] configuration would cause this problem, but it is.

How can I fix this?

Nick Williams
  • 2,864
  • 5
  • 29
  • 43

1 Answers1

1

Well, it took me a lot of debugging, but I figured out the problem...

Our 10,000 tests take a long, long time to run, unless we run them in parallel processes. So we use a tool that splits up the nose tests into 20 parallel processes and runs the tests in groups. (This still takes roughly 20 minutes to complete, but it's better than nearly two hours.)

We use the FastFixtureTestCase, which extends the TransactionTestCase. At the start of each test case, FastFixtureTestCase calls django.test.testcases: connections_support_transactions(). That function loops over all the DATABASES connections and calls supports_transactions on each one. My mistake was assuming that supports_transactions should be an inherently safe operation. It's not.

supports_transactions does the following things:

  1. Creates a new table
  2. Commits
  3. Inserts a value into that table
  4. Rolls back
  5. Selects the number of rows in that table
  6. Drops the table
  7. Commits
  8. Returns True if the number of rows in the table was 0 (meaning the rollback succeeded, so transactions must be supported).

This is not safe. This is very dangerous. No two processes or servers can run this at the same time against the same database. If two or more processes run this function at the same time, at best, one will return True and the other(s) will raise an exception. At worst, all will raise an exception.

In my case, because we have so many test cases, most of the time the processes were avoiding executing connections_support_transactions at the same time, but when they did, it resulted in a small handful of random failures that were different each time.

One possible solution is to use SimpleTestCase instead of FastFixtureTestCase, as @kmmbvnr pointed out. However, this is not an option for us, as our entire test infrastructure depends on what FastFixtureTestCase does to the rest of our databases. So, instead, I overrode supports_transactions just for the static, shared database with the following line of code, and the errors went away:

connections['geo'].features.supports_transactions = True
Community
  • 1
  • 1
Nick Williams
  • 2,864
  • 5
  • 29
  • 43