2

I have a project where I have to store 16 objects, each containing a list of 185 000 double's. Overall size of the saved object should be around 20-30 mb (sizeof(double) * 16 * 185 000), but when I try to retrieve it from database, the database allocates 200 mb to retrieve this 20-30 mb object.

My questions are:

  1. Is this expected behaviour?
  2. How can I avoid such huge allocation of memory when I just want to retrieve one document?

Here is fully replicable example and screenshots of profiler:

class Program
{
    private static string _path;

    static void Main(string[] args)
    {
        _path = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "testDb");

        // Comment after first insert to avoid adding the same object.
        AddData();

        var data = GetData();

        Console.ReadLine();
    }

    public static void AddData()
    {
        var items = new List<Item>();
        for (var index = 0; index < 16; index++)
        {
            var item = new Item {Values = Enumerable.Range(0, 185_000).Select(v => (double) v).ToList()};
            items.Add(item);
        }
        var testData = new TestClass { Name = "Test1", Items = items.ToList() };

        using (var db = new LiteDatabase(_path))
        {
            var collection = db.GetCollection<TestClass>();
            collection.Insert(testData);
        }
    }

    public static TestClass GetData()
    {
        using (var db = new LiteDatabase(_path))
        {
            var collection = db.GetCollection<TestClass>();
            // This line causes huge memory allocation and wakes up garbage collector many many times.
            return collection.FindOne(Query.EQ(nameof(TestClass.Name), "Test1"));
        }
    }
}

public class TestClass
{
    public int Id { get; set; }
    public string Name { get; set; }
    public IList<Item> Items { get; set; }
}

public class Item
{
    public IList<double> Values { get; set; }
}

Changing 185_000 to 1_850_000 makes my RAM usage go to >4GB(!)

Profiler: Profiler image

FCin
  • 3,804
  • 4
  • 20
  • 49
  • @Caramiriel What do you mean? Are you refering to `Enumerable.Range(0, 185_000)` or profiler screenshot? – FCin Aug 21 '18 at 12:45
  • Scratch that, now that I see BsonArray, makes it more likely to be the issue in the LiteDB framework. – Caramiriel Aug 21 '18 at 12:46
  • 1
    `Is this expected behaviour?` I am no `LiteDB` expert, but it wouldn't surprise me. A large array will be on the Large Object Heap. If it is cloned as part of the reading process then that could explain some of the RAM usage. If it is converted to or from another format (e.g. BSON) that would likely explain part of it too. Fundamentally, what is your concern here? .NET is a garbage collected runtime - and there is no harm in it using more RAM than you think it needs to. Why are you worried? – mjwills Aug 21 '18 at 12:47
  • @mjwills I understand that copying increases memory usage but going from 20 mb to >200 mb is absurd. If I change this to create 1 185 000 elements then it becomes unsable because of ram usage. I'm not nosql specialist, but allocating >4gb for 200 mb file doesn't seem right. – FCin Aug 21 '18 at 12:49
  • If that is the case, I suspect you need to try other database platforms and see whether they perform more appropriately for your needs. And / or stop storing enormous payloads. – mjwills Aug 21 '18 at 12:52
  • @mjwills I would do that if there were any. I don't know of any nosql serverless database for .NET. To be honest it looks like a bug in LiteDb, because with this simple query there is no sorting, searching involved. There is only 1 document in the whole database, so it has to be replicating it many many times. – FCin Aug 21 '18 at 12:54
  • Perhaps raise a bug with them. And report back here what the outcome is. – mjwills Aug 21 '18 at 12:58
  • delete my comment as they were wrong... think you have probably found a memory leek in LiteDB, ensure you are using latest – Seabizkit Aug 21 '18 at 13:08

2 Answers2

2

There are several reasons in LiteDB to allocate much more memory than direct List<Double>.

To understand this, you need know that your typed class are converted into a BsonDocument structure (with BsonValues). This structure has an overhead (+1 or +5 bytes per BsonValue).

Also, to serialize this class (when you insert), LiteDB must create one single byte[] with all this BsonDocument (in BSON format). After, this super large byte[] are copied to many extend pages (each page contains a byte[4070]).

Not only this, also LiteDB must keep track original data to store in journal area. So, this size can be doubled.

To deserialize, LiteDB must do inverse process: read all pages from disk to memory, join all pages into a single byte[], deserialize into BsonDocument to finish map to your class.

This operations, for small objects, are ok. This memory are reused for each new document read/write so memory keeps in control.

In next v5 version this process has some optimizations, like:

  • Deserialize do not need allocated all data into a single byte[] to read document. This can be done using new ChunkStream(IEnumerable<byte[]>). Serialization still need this single byte[]
  • Journal file was changed to WAL (write ahead log) - don't need keep original data.
  • ExtendPage are not stored in cache anymore

For future versions I thinking in use new Span<T> class to re-use previous memory allocations. But I need study more about this.


But, store a single document with 185,000 values are best solution in any nosql database. MongoDB limit BSON document size in 16Mb (and early versions was ~368kb limit)... I limited LiteDB to 1Mb in v2... but I remove this check size and just add as recommendation to avoid large single documents.

Try split your class into 2 collections: one for your data and another for each value. You can also split this large array into chunks, like LiteDB FileStorage or MongoDB GridFS.

mbdavid
  • 1,076
  • 7
  • 8
  • Thank you very much. I will think of a way to split this huge array into many small ones. – FCin Aug 22 '18 at 05:17
0

First, the way you are creating the list, it will have reserved room for 262.144 elements, due to it's growth algorithm.

You should set the amount of items beforehand to avoid this (or maybe just use an array alltogether):

Values = new List<double>(max);
Values.AddRange(Enumerable.Range(0, max).Select(v => (double)v));

As far as LiteDB goes, if you don't need a database (and the potential overhead it brings), just store it in a data structure of your own. I don't see any benefit of a database if you don't actually use a database and only store a single item.

nvoigt
  • 75,013
  • 26
  • 93
  • 142
  • 1
    Actually I do set max in my production code. My posted code is just the most basic example I could think of, so it stores 1 document. In my real application I can have many different documents and I use queries for filtering them. I didn't want to post my classes, because they aren't helpful in replicating the issue. – FCin Aug 21 '18 at 14:56