2

I'm using Twisted 11 together with SQLAnywhere 12 via the official sqlanydb driver.

Generally, it works fine.

But occasionally the application crashes with an abort on the first query.

If one query worked, all following work too. However my tests run only seldom through.

That's awful to develop with and strace doesn't tell me anything informative too. Sometimes it crashes inside of select(), sometimes in mmap()...

I'm running 64bit Linux and run locally the Sybase as dbeng12 for testing.

Is anyone working successfully using these components? Any suggestions how to solve that? I used sqlanydb with Django before and it never crashed.

Using prints, I found out it crashes inside of the DeferredList, the important code is basically the following:

class WhoisDb(object):
    # ... shortened ...
    def _get_contacts(self, dom):
        if not dom:
            self.d.errback(UnknownDomain(self._get_limit()))
            return
        self.dom = Domain._make(dom[0])

        dl = defer.DeferredList( [
            self.dbpool.runQuery(CON_SQL, (self.dom.dom_owner,)),
            self.dbpool.runQuery(CON_SQL, (self.dom.dom_admin,)),
            self.dbpool.runQuery(CON_SQL, (self.dom.dom_tech,)),
            self.dbpool.runQuery(
                LAST_UPDATE_SQL,
                ( self.dom.domName, )), ] ).addCallback(self._fmt_string)

    def get_whois(self, domain):
        self.d = defer.Deferred()
        if not self._check_limit():
            self.d.errback(LimitExceeded(MAX_PER_HOUR))
        elif not RE_ALLOWED_TLDS.match(domain):
            self.d.errback(UnknownDomain(self._get_limit()))
        else: 
            self.dbpool.runQuery(
                    'select ' + DOM_FIELDS + ' from domains where '
                    'domain = ? or domain_idn = ?',
                    ( domain, domain, )) \
                            .addCallback(self._get_contacts)

        return self.d

_fmt_string() is not called if it crashes.

Inside gdb, it's a simple SIGSEV:

(gdb) run ~/.virtualenvs/whois/bin/trial test.test_protocol.ProtocolTestCase.test_correct_domain
Starting program: /home/hynek/.virtualenvs/whois/bin/python ~/.virtualenvs/whois/bin/trial test.test_protocol.ProtocolTestCase.test_correct_domain
[Thread debugging using libthread_db enabled]
test.test_protocol
  ProtocolTestCase
    test_correct_domain ... [New Thread 0x7ffff311a700 (LWP 6685)]
[New Thread 0x7ffff3099700 (LWP 6686)]
[New Thread 0x7ffff27dc700 (LWP 6723)]
[New Thread 0x7ffff1fdb700 (LWP 6724)]
[New Thread 0x7ffff17da700 (LWP 6725)]
[New Thread 0x7ffff0fd9700 (LWP 6729)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff1fdb700 (LWP 6724)]
0x00007ffff4d4167c in ?? () from /opt/sqlanywhere12/lib64/libdbcapi_r.so
Danny Beckett
  • 20,529
  • 24
  • 107
  • 134
hynek
  • 3,647
  • 1
  • 18
  • 26
  • 1
    Maybe this is caused by a stale connection to the DB? Does this only crash after your app has been running for a while (say 5+ minutes) without any activity? Can you try checking the status of your connection prior to running any queries? I'm also curious how you're integrating this into Twisted. – stderr Apr 26 '11 at 12:46
  • I can disqualify that. I start the DB server anew in each setUp() in order to prevent such effects. It happens immediately in any case. I added a sleep() after the init to circumvent possible raceconsitions but it doesn't help. – hynek Apr 26 '11 at 12:49
  • Oh this is failing in your test suite? That's actually good news I think. The crashes inside select / mmap kind of point to Twisted. SqlAnyDb doesn't include a CExtension, does it? Are you using any other CExtension modules? You're running the test suite with Twisted Trial? – stderr Apr 26 '11 at 15:49
  • I don't consider it good news if Twisted is broken. ;) sqlanydb doesn't compile an own CExtension AFAIK, it just directly interfaces a shared library.And yes I run it with trial. Actually, in my test suite it crashes much more often than in production. – hynek Apr 27 '11 at 05:41
  • Are you using twisted.enterprise.adbapi or some other method to work with the DB? Since t.e.adbapi uses threads for each connection to the DB perhaps sqlanydb isn't threadsafe... I think we may need more information about your code, can you post it, or some of it somewhere? – stderr Apr 27 '11 at 12:24
  • Yes, I use adbapi. It also totally makes sense, that there might be thread safety issues. :-/ I've added the important code. – hynek Apr 27 '11 at 14:59
  • 1
    A few times you said "crashes", but it's not really clear what this means. You said "abort" too. Does that mean that the process is ending with SIGABRT? If that's the case and all you have to go on is C stack traces, make sure you're looking at the stack in all threads in the process. One of them may be calling abort(). This would look mysterious and random if you only looked at the stack trace of a different thread. – Jean-Paul Calderone Apr 27 '11 at 15:03
  • 1
    It's always an abort. I managed to get a "[unixshm] AttachToSharedMem p=5149 e=__SQLAnyCli__5149_0c522015 id=5 n= o=0 s=4096 d=1 errno=2 FAILED" once now (in trial, I start a dbeng12 which is shared mem). I started it in gdb and added the output above. Looks like a sqlany issue and that was my original question: has anybody ever used it successfully? :) Or is there a trick to make it work? – hynek Apr 27 '11 at 15:15
  • Thanks for posting the code. Right now I'm not seeing an errback attached to your deferredlist... I'm thinking it's worth just replacing the deferred list with a single pool.runInteraction call. – stderr Apr 27 '11 at 15:55
  • 2
    Given that it's crashing in `libdbcap_r.so`, this pretty clearly looks like a bug in sqlanywhere to me. Probably a thread-safety issue. I wish I could be clearer, but I've never used sqlanywhere. (I should note, though, that Twisted isn't the only program ever to call `select()`, so even that may be the sqlanywhere library doing some other I/O.) – Glyph Apr 27 '11 at 19:21

2 Answers2

2

It looks like your database library is not threadsafe. In order to make it a stable connection, do this:

self.dbpool = ConnectionPool(..., cp_min=1, cp_max=1)

This will set the maximum concurrency to 1, and the ThreadPool will be limited to 1 thread, meaning that no queries will run simultaneously. This should stop your non-threadsafe library from causing you any drama, while still running the queries in a thread and not blocking the mainloop.

Jerub
  • 41,746
  • 15
  • 73
  • 90
  • Another solution that would allow more than one query at a time would be using ODBC. I wrote a [blog post](http://hynek.me/blog/2011/04/twisted-sybase/) about the gotchas that might be encountered. – hynek May 04 '11 at 08:33
1

Yeah, your deferred list looks like it's not going to do what you want. Each runQuery is going to be run in a adbapi threadpool so there's no guarantee of the ordering of those queries. The "LAST_UPDATE_SQL" being the last thing in the DeferredList is not necessarily going to make it happen last. Are the queries in the deferred list supposed to be part of a single transaction?

Not knowing exactly what the SQL queries are here I'm assuming that sometimes a transaction has been setup for your LAST_UPDATE_SQL and sometimes it hasn't been setup depending on the order those runQuery's end up actually running.

Here's how to replace the deferred list with a single adbapi thread using adbapi.runInteraction. I'm not 100% convinced this will fix your issues but I think it's the correct way to write the sort of database interaction you're attempting to do.

class WhoisDb(object):
    # ... shortened ...
    def _get_contacts(self, dom):
        if not dom:
            self.d.errback(UnknownDomain(self._get_limit()))
            return
        self.dom = Domain._make(dom[0])

        d = self.dbpool.runInteraction(
                 self._get_stuff_from_db
            )
        d.addCallback(self._fmt_string)
        d.addErrback(self._fmt_string) # don't forget to add an errback!
        return d

    def _get_stuff_from_db(self, cursor):
        cursor.execute(CON_SQL, (self.dom.dom_owner,)),
        cursor.execute(CON_SQL, (self.dom.dom_admin,)),
        cursor.execute(CON_SQL, (self.dom.dom_tech,)),
        cursor.execute(
            LAST_UPDATE_SQL,
            ( self.dom.domName, )), ] )
        return cursor.fetchall() # or whatever you need to return obviously
stderr
  • 8,567
  • 1
  • 34
  • 50
  • Thank you for your effort and I tried it out just like you suggested, without any success unfortunately. NB: They don't need to be inside of a transaction, the queries are absolutely independent, they just need the same parameter. Prints revealed, that the abort takes place _after_ leaving `_get_contacts()` but before entering `_get_stuff_from_db()`. – hynek Apr 27 '11 at 17:02