Lazy initialized caching... how do I make it thread-safe?

Question

that's what I have:

a Windows Service
- C#
- multithreaded
- the service uses a Read-Write-Lock (multiple reads at one time, writing blocks other reading/writing threads)
a simple, self-written DB
- C++
- small enough to fit into memory
- big enough not wanting to load it on startup (e.g. 10GB)
- read-performance is very important
- writing is less important
- tree structure
- informations held in tree nodes are stored in files
- for faster performance, the files are loaded only the first time they are used and cached
- lazy initialization for faster DB startup

As the DB will access those node informations very often (in the magnitude of several thousand times a second) and as I don't write very often, I'd like to use some kind of double checked locking pattern.

I know there is many questions about the double checked locking pattern here, but there seems to be so many different opinions, so I don't know what's the best for my case. What would you do with my setup?

Here's an example:

a tree with 1 million nodes
every node stores a list of key-value-pairs (stored in a file for persistence, file size magnitude: 10kB)
when accessing a node for the first time, the list is loaded and stored in a map (sth. like std::map)
the next time this node is accessed, I don't have to load the file again, I just get it from the map.
only problem: two threads are simultaneously accessing the node for the first time and want to write to the cache-map. This is very unlikely to happen, but it is not impossible. That's where I need thread-safety, which should not take too much time, as I usually don't need it (especially, once the whole DB is in memory).

You already *have* all this? Then I'd just sit back, ship it and enjoy the windfall. — Kerrek SB, Nov 11 '11 at 17:07
Can you estimate the size more precisely than "several GB"? I'd think *very* hard about whether you can fit the *whole* DB in memory instead. You might consider, for example, storing the data compressed (e.g., some LZ-based compression) to help. Saving even a *few* disk accesses can cover quite a lot of decompression time. — Jerry Coffin, Nov 11 '11 at 17:50
@Kerrek: I'd like to sit back and enjoy, but for now I can't use the DB in a multithreaded way, because it is not completely threadsafe yet. Hence this thread ;) — Ben, Nov 14 '11 at 09:36
@Jerry Coffin: The aim is to keep the whole DB in memory. For now, the DB is small enough to fit into one computer's RAM. The time it will take for the DB to grow bigger, we hope to have a distributed solution. The DB informations are saved in files for persistence reasons only. I just don't want to wait minutes for the DB to start when loading everything into memory, so I'm using a lazy cache, which makes it hard to use it multithreaded. — Ben, Nov 14 '11 at 09:37

ali_bahoo · Accepted Answer · 2011-11-14T11:59:34.127

About double checked locking:

class Foo
{
  Resource * resource;

  Foo() : resource(nullptr) { }
public:
  Resource & GetResource()
  {
    if(resource == nullptr)
    {
      scoped_lock lock(mutex); 
      if(resource == nullptr)
        resource = new Resource();
    }
    return *resource;
  }
}

It is not thread-safe as you check whether the address of resource is null. Because there is a chance that resource pointer is assigned to a non-null value right before the initializing the Resource object pointed to it.

But with the "atomics" feature of C++11 you may have a doubly checked locking mechanism.

class Foo
{
  Resource * resource;
  std::atomic<bool> isResourceNull;
public:
  Foo() : resource(nullptr), isResourceNull(true) { }

  Resource & GetResource()
  {
    if(isResourceNull.load())
    {
      scoped_lock lock(mutex); 
      if(isResourceNull.load())
      {
        resource = new Resoruce();
        isResourceNull.store(false);
      }
    }
    return *resource;
  }
}

EDIT: Without atomics

#include <winnt.h>

class Foo
{
  volatile Resource * resource;

  Foo() : resource(nullptr) { }
public:
  Resource & GetResource()
  {
    if(resource == nullptr)
    {
      scoped_lock lock(mutex); 
      if(resource == nullptr)
      {
        Resource * dummy = new Resource();
        MemoryBarrier(); // To keep the code order
        resource = dummy;  // pointer assignment
      }
    }
    return  *const_cast<Resource*>(resource);
  }
}

MemoryBarrier() ensures that dummy will be first created then assigned to resource. According to this link pointer assignments will be atomic in x86 and x64 systems. And volatile ensures that the value of resource will not be cached.

yes, I know about the problem, that Meyers and Alexandrescu pointed out. But there must be some way to implement it - in their paper, they talk about a memory barrier, which is platform/compiler-dependent. I'm using MSVC 2010, so what kind of memory barrier could I use? C++11 isn't an option yet... — Ben, Nov 14 '11 at 10:47
@Ben: There is also [tbb::atomic](http://threadingbuildingblocks.org/files/documentation/a00117.html) that you can use. But its free version is GPL licensed. — ali_bahoo, Nov 14 '11 at 12:03
the MemoryBarrier() looks good, I'm going to try that! Thanks! — Ben, Nov 14 '11 at 12:57

score 1 · Answer 2 · edited Nov 11 '11 at 18:06

1

Are you asking how to make reading the DB or reading the Nodes thread safe?

If you're trying to the latter and you're not writing very often, then why not make your nodes immutable, period? If you need to write something, then copy the data from the existing node, modify it and create another node which you can then put in your database.

edited Nov 11 '11 at 18:06

ildjarn

62,044
9
127
211

answered Nov 11 '11 at 17:13

Kiril

39,672
31
167
226

I guess my posting wasn't clear enough. I'm not using a DB like MySQL or Oracle... the tree structure I'm building IS the database. Of course a very simple one. And querying the DB results in visiting many nodes, retrieving informations from them, merging the informations and returning the result. I will edit my question to provide more informations. – Ben Nov 14 '11 at 09:45

Lazy initialized caching... how do I make it thread-safe?

2 Answers2

Linked