3

I would like to use db4o as the backend of a custom cache implementation. Normally my program involves loading into memory some 40,000,000 objects and working on them simultaneously. Obviously this requires a lot of memory and I thought of perhaps persisting some of the objects (those not in a cache) to a db4o database. My preliminary tests show db4o to be a bit slower than I would like (about 1,000,000 objects took 17 minutes to persist). However, I was using the most basic setup.

I was doing something like this:

using (var reader = new FileUnitReader(Settings, Dictionary, m_fileNameResolver, ObjectFactory.Resolve<DataValueConverter>(), ObjectFactory.Resolve<UnitFactory>()))
using (var db = Db4oEmbedded.OpenFile(Db4oEmbedded.NewConfiguration(), path))
{
    var timer = new Stopwatch();
    timer.Start();
    IUnit unit = reader.GetNextUnit();
    while (unit != null)
    {
        db.Store(unit);
        unit = reader.GetNextUnit();
    }
    timer.Stop()
    db.Close();

    var elapsed = timer.Elapsed;
}

Can anyone offer advice on how to improve performance in this scenario?

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
Jeffrey Cameron
  • 9,975
  • 10
  • 45
  • 77
  • Are your objects of the same size, or of a maximum size? And do they have some sort of id? – Mikael Svenson Jul 11 '10 at 12:45
  • They do have an identifier. The size of the objects will not vary within a single run of the program, however different runs would produce diffe)rently sized objects (based on the number of variables being read in – Jeffrey Cameron Jul 11 '10 at 13:00
  • 1
    If you can use .Net4 you might have more success creating a huge file, allocating space for each object in the file, and access them by offset in the file directly (id*itemsize). And use memory mapped files to access the file for random access. This would be a different approach to db4. A bit like I did here - http://stackoverflow.com/questions/2545882/optimal-storage-of-data-structure-for-fast-lookup-and-persistence – Mikael Svenson Jul 11 '10 at 13:11
  • I have been thinking of that. There is a library here: http://mmf.codeplex.com/ that provides memory-mapped file access in .NET version earlier than 4 but it isn't optimized for 32-bit systems which is what I really want. We do have an MSDN Universal subscription at work so I could try upgrading to .NET4 sometime in the near future and try out your solution. Thanks – Jeffrey Cameron Jul 11 '10 at 14:05
  • 1
    The mmf project is my project ;) and it's not that hard to get it to work better on 32bit. And you remind me I need to do the .Net4 version and some other otimizations as well. – Mikael Svenson Jul 11 '10 at 19:31
  • I really liked the mmf project! How can I get it to work better on 32-bit without switching to .NET 4? I'd be interested to know! – Jeffrey Cameron Jul 12 '10 at 11:53

2 Answers2

2

Well I think there are a few options to improve the performance in this situation.

I've also discovered that the reflection-overhead in such scenarios can become quite a large part. So you may should try the fast-reflector for your case. Note that the FastReflector consumes more memory. However in your scenario this won't really matter. You can the fast-reflector like this:

var config = Db4oEmbedded.NewConfiguration();
config.Common.ReflectWith(new FastNetReflector());

using(var container = Db4oEmbedded.OpenFile(config, fileName))
{
}

When I did similar tiny 'benchmarks', I discovered that a larger cache-size improves the performance also a little, even when you write to the database:

var config = Db4oEmbedded.NewConfiguration();
config.File.Storage = new CachingStorage(new FileStorage(), 128, 1024 * 4);

Other notes: The transaction-handling of db4o isn't really optimized for giant transactions. When you store a 1'000'000 objects in one transaction, the commit may take ages or you run out of memory. Therefore you may want to commit more often. For example commit after every 100'000 stored object. Of course you need to check if it really makes an impact for your scenario.

Gamlor
  • 12,978
  • 7
  • 43
  • 70
  • 1
    Interesting. I tried the FastNetReflector and it halved the amount of time required. However, I'm still not quite near my goal of 2 minutes per 1,000,000 records loading speed. The FastNetReflector took me down to about 8-9mins per 1,000,000. Any other suggestions – Jeffrey Cameron Jul 12 '10 at 15:12
  • Hmm, I don't know anything which would make it 4 times faster to reach your goal. I would need to investigate what the bottleneck is for that. Do you really need a complex object-database? Since you use it for caching there a probably more 'lightweight' solutions out there, which are way faster. – Gamlor Jul 12 '10 at 16:25
  • There are some others (see the comments above) but I thought I would try the db4o solution since it offered simplicity and robustness. Thanks though! – Jeffrey Cameron Jul 12 '10 at 16:54
1

Another small improvement that you could try:

Get the extended interface by adding .Ext() to the OpenFile() call.

Purge every object after you stored it.

using (var db = Db4oEmbedded.OpenFile(Db4oEmbedded.NewConfiguration(), path).Ext())
// ....
db.Store(unit);
db.Purge(unit);
// ....

That way you will reduce the number of references that db4o has to maintain in the current transaction.

Probably you have the most potential for another big improvement if you play with the Storage configuration (That's the pluggable file system below db4o.) The latest 8.0 builds have a better cache implementation that don't degrade performance for cache maintenance when you work with larger numbers of cache pages.

I suggest you try the latest 8.0 build with the cache setup that Gamlor has suggested to see if that makes a difference:

config.File.Storage = new CachingStorage(new FileStorage(), 128, 1024 * 4);

If it does, you could try much higher numbers also:

config.File.Storage = new CachingStorage(new FileStorage(), 1280, 1024 * 40);
Carl Rosenberger
  • 910
  • 6
  • 12
  • Oddly enough, with the upgrade from 7.12 to 8.0 and the use of db.Purge (along with FastNetReflector and CachingStorage, as previously suggested) the program actually took longer ... :-/ I'm going to try it again without the Purge, see if that helps – Jeffrey Cameron Jul 12 '10 at 20:52
  • OK, Tried it without the db.Purge() call. IT looks like it is much better at managing memory but is still slower than 7.12 – Jeffrey Cameron Jul 12 '10 at 21:15
  • db4o 8.0 has a new IdSystem by default, which is considerably faster for fragmented databases but it may be slightly slower for raw store operations where there are no fragmentation effects. The old PointerBasedIdSystem can still be used as follows: config.IdSystem.UsePointerBasedSystem(); – Carl Rosenberger Jul 13 '10 at 16:54
  • Thanks Carl, I'll try that as well and let you know. – Jeffrey Cameron Jul 14 '10 at 00:49