1

that's what I have:

  • a Windows Service
    • C#
    • multithreaded
    • the service uses a Read-Write-Lock (multiple reads at one time, writing blocks other reading/writing threads)
  • a simple, self-written DB
    • C++
    • small enough to fit into memory
    • big enough not wanting to load it on startup (e.g. 10GB)
    • read-performance is very important
    • writing is less important
    • tree structure
    • informations held in tree nodes are stored in files
    • for faster performance, the files are loaded only the first time they are used and cached
    • lazy initialization for faster DB startup

As the DB will access those node informations very often (in the magnitude of several thousand times a second) and as I don't write very often, I'd like to use some kind of double checked locking pattern.

I know there is many questions about the double checked locking pattern here, but there seems to be so many different opinions, so I don't know what's the best for my case. What would you do with my setup?

Here's an example:

  • a tree with 1 million nodes
  • every node stores a list of key-value-pairs (stored in a file for persistence, file size magnitude: 10kB)
  • when accessing a node for the first time, the list is loaded and stored in a map (sth. like std::map)
  • the next time this node is accessed, I don't have to load the file again, I just get it from the map.
  • only problem: two threads are simultaneously accessing the node for the first time and want to write to the cache-map. This is very unlikely to happen, but it is not impossible. That's where I need thread-safety, which should not take too much time, as I usually don't need it (especially, once the whole DB is in memory).
Ben
  • 4,486
  • 6
  • 33
  • 48
  • 2
    You already *have* all this? Then I'd just sit back, ship it and enjoy the windfall. – Kerrek SB Nov 11 '11 at 17:07
  • Can you estimate the size more precisely than "several GB"? I'd think *very* hard about whether you can fit the *whole* DB in memory instead. You might consider, for example, storing the data compressed (e.g., some LZ-based compression) to help. Saving even a *few* disk accesses can cover quite a lot of decompression time. – Jerry Coffin Nov 11 '11 at 17:50
  • @Kerrek: I'd like to sit back and enjoy, but for now I can't use the DB in a multithreaded way, because it is not completely threadsafe yet. Hence this thread ;) – Ben Nov 14 '11 at 09:36
  • @Jerry Coffin: The aim is to keep the whole DB in memory. For now, the DB is small enough to fit into one computer's RAM. The time it will take for the DB to grow bigger, we hope to have a distributed solution. The DB informations are saved in files for persistence reasons only. I just don't want to wait minutes for the DB to start when loading everything into memory, so I'm using a lazy cache, which makes it hard to use it multithreaded. – Ben Nov 14 '11 at 09:37

2 Answers2

4

About double checked locking:

class Foo
{
  Resource * resource;

  Foo() : resource(nullptr) { }
public:
  Resource & GetResource()
  {
    if(resource == nullptr)
    {
      scoped_lock lock(mutex); 
      if(resource == nullptr)
        resource = new Resource();
    }
    return *resource;
  }
}

It is not thread-safe as you check whether the address of resource is null. Because there is a chance that resource pointer is assigned to a non-null value right before the initializing the Resource object pointed to it.

But with the "atomics" feature of C++11 you may have a doubly checked locking mechanism.

class Foo
{
  Resource * resource;
  std::atomic<bool> isResourceNull;
public:
  Foo() : resource(nullptr), isResourceNull(true) { }

  Resource & GetResource()
  {
    if(isResourceNull.load())
    {
      scoped_lock lock(mutex); 
      if(isResourceNull.load())
      {
        resource = new Resoruce();
        isResourceNull.store(false);
      }
    }
    return *resource;
  }
}

EDIT: Without atomics

#include <winnt.h>

class Foo
{
  volatile Resource * resource;

  Foo() : resource(nullptr) { }
public:
  Resource & GetResource()
  {
    if(resource == nullptr)
    {
      scoped_lock lock(mutex); 
      if(resource == nullptr)
      {
        Resource * dummy = new Resource();
        MemoryBarrier(); // To keep the code order
        resource = dummy;  // pointer assignment
      }
    }
    return  *const_cast<Resource*>(resource);
  }
}

MemoryBarrier() ensures that dummy will be first created then assigned to resource. According to this link pointer assignments will be atomic in x86 and x64 systems. And volatile ensures that the value of resource will not be cached.

ali_bahoo
  • 4,732
  • 6
  • 41
  • 63
  • yes, I know about the problem, that Meyers and Alexandrescu pointed out. But there must be some way to implement it - in their paper, they talk about a memory barrier, which is platform/compiler-dependent. I'm using MSVC 2010, so what kind of memory barrier could I use? C++11 isn't an option yet... – Ben Nov 14 '11 at 10:47
  • @Ben: There is also [tbb::atomic](http://threadingbuildingblocks.org/files/documentation/a00117.html) that you can use. But its free version is GPL licensed. – ali_bahoo Nov 14 '11 at 12:03
  • the MemoryBarrier() looks good, I'm going to try that! Thanks! – Ben Nov 14 '11 at 12:57
1

Are you asking how to make reading the DB or reading the Nodes thread safe?

If you're trying to the latter and you're not writing very often, then why not make your nodes immutable, period? If you need to write something, then copy the data from the existing node, modify it and create another node which you can then put in your database.

ildjarn
  • 62,044
  • 9
  • 127
  • 211
Kiril
  • 39,672
  • 31
  • 167
  • 226
  • I guess my posting wasn't clear enough. I'm not using a DB like MySQL or Oracle... the tree structure I'm building IS the database. Of course a very simple one. And querying the DB results in visiting many nodes, retrieving informations from them, merging the informations and returning the result. I will edit my question to provide more informations. – Ben Nov 14 '11 at 09:45