24

I will start with TL;DR version as this may be enough for some of you:


  • We are trying to investigate an issue that we see in diagnostic data of our C++ product.
  • The issue was pinpointed to be caused by timeout on sqlite3_open_v2 which supposedly takes over 60s to complete (we only give it 60s).
  • We tried multiple different configurations, but never were able to reproduce even 5s delay on this call.

So the question is if maybe there are some known scenarios in which sqlite3_open_v2 can take that long (on windows)?

Now to the details:

  • We are using version 3.10.2 of SQLite. We went through changelogs from this version till now and nothing we've found in the bugfixes section seems to suggest that there was some issue that was addressed in consecutive SQLite releases and may have caused our problem.
  • The issue we see affects around 0.1% unique user across all supported versions of windows (Win 7, Win 8, Win 10). There are no manual user complaints/reports about that - this can suggest that a problem happens in the context where something serious enough is happening with the user machine/system that he doesn't expect anything to work. So something that indicates system-wide failure is a valid possibility as long as it can possibly happen for 0.1% of random windows users.
  • There are no data indicating that the same issue ever occurred on Mac which is also supported platform with large enough sample of diagnostic data.
  • We are using Poco (https://github.com/pocoproject/poco, version: 1.7.2) as a tool for accessing our SQLite database, but we've analyzed the Poco code and it seems that failure on this code level can only (possibly) explain ~1% of all collected samples. This is how we've determined that the problem lies in sqlite3_open_v2 taking a long time.
  • This happens on both DELETE journal mode as well as on WAL.
  • It seems like after this problem happens the first time for a particular user each consecutive call to sqlite3_open_v2 takes that long until the user restarts whole application (possibly machine, no way to tell from our data).
  • We are using following flags setup for sqlite3_open_v2 (as in Poco):

sqlite3_open_v2(..., ..., SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_URI, NULL);

  • This usually doesn't happen on startup of the application so it's not likely to be caused by something happening while our application is not running. This includes power cuts offs causing data destruction (which tends to return SQLITE_CORRUPT anyway, as mentioned in https://www.sqlite.org/howtocorrupt.html).
  • We were never able to reproduce this issue locally even though we tried different things:

    • Multiple threads writing and reading from DB with synchronization required by a particular journaling system.
    • Keeping SQLite connection open for a long time and working on DB normally in a meanwhile.
    • Trying to hit HDD hard with other data (dumping /dev/rand (WSL) to multiple files from different processes while accessing DB normally).
    • Trying to force antivirus software to scan DB on every file access (tested with Avast with basically everything enabled including "scan on open" and "scan on write").
    • Breaking our internal synchronization required by particular journaling systems.
    • Calling WinAPI CreateFile with all possible combinations of file sharing options on DB file - this caused issues but sqlite3_open_v2 always returned fast - just with an error.
    • Calling WinAPI LockFile on random parts of DB file which is btw. nice way of reproducing SQLITE_IOERR, but no luck with reproducing the discussed issue.
    • Some additional attempts to actually stretch the Poco layer and double-check if our static analysis of codes is right.
  • We've tried to look for similar issues online but anything somewhat relevant we've found was here sqlite3-open-v2-performance-degrades-as-number-of-opens-increase. This doesn't seem to explain our case though, as the numbers of parallel connections are way beyond what we have as well as what would typical windows users have (unless there is some somewhat popular app exploiting SQLite which we don't know about).

  • It's very unlikely that this issue is caused by db being accessed through network share as we are putting the DB file inside %appdata% unless there is some pretty standard windows configuration which sets %appdata% to be a remote share.

Do you have any ideas what can cause that issue?

Maybe some hints on what else should we check or what additional diagnostic data that we can collect from users would be useful to pinpoint the real reason why that happens?

Thanks in advance

Muhammad Usman Bashir
  • 1,441
  • 2
  • 14
  • 43
Yester
  • 652
  • 6
  • 18
  • 3
    Actually I saw similar problems with other sqlite on enterprise systems which had McAfee Virus Scan installed configured in a way that it scans everything always. We even saw Access Denied from time to time. – S.Spieker Aug 23 '18 at 07:24
  • Thanks for a suggestion - I will check that theory and come back here. – Yester Aug 23 '18 at 08:43
  • @Yester: have you nailed down the issue? If so, what was the cause? – sophros Mar 23 '19 at 12:37
  • I'm sorry but not really. I did ruled out McAfee Virus Scan as a root cause however, if that helps. – Yester Mar 24 '19 at 16:41
  • The post does not mention any type of leak, but the description screams of a connection leak and/or memory leak (despite you seeming to have ruled out connection problems). This post has no mention of transactions, exception handling, etc. If your DB access logic is not solid... even a single DB execution exception (~%1 as you state) could cause snowballing problems--perhaps its causing a failure which leaves a dangling connection, one that either cascades or recurs until the eventual timeouts are experienced. It may come down to your management of connections, transactions, exceptions. – C Perkins Aug 29 '20 at 17:24
  • If there are no manual user complaints/reports for this issue, then is it an issue? –  Jan 24 '22 at 15:05

0 Answers0