2

I've 10,000,000 entry of type struct{int, int, int, int}. when I store them using QHash or QMap, it occupies large amount of memory, indeed it must take about

10,000,000 * 4 * 4 (sizeof integer) <= 153 MB

but when I load my data it takes about 1.2 GB for both QHash and QMap, why this occurs and how can I optimize it for both speed and memory?(through any other data structure or some tricks to qmap and qhash)

abdolahS
  • 663
  • 12
  • 35
  • 2
    `QHash` and `QMap` are associative containers: guessing your 4-int struct is the value stored, what is the type of the keys? – rocambille May 17 '17 at 12:01
  • @wasthishelpful keys are four other integers which I store them using QtPrivate::QHashCombine for qhash() – abdolahS May 17 '17 at 14:28
  • 1
    It would probably be close to 153 MB if it is a sequential array, but maps have additional data structure overhead and heap allocation overhead. It still shouldn't be that much though. – dtech May 17 '17 at 16:38
  • How do you measure memory consumption? And how do you add the elements? Try to use: _yourqhash.reserve(maxsize);_ before you add elements and see what happens. – Zlatomir May 17 '17 at 18:32
  • 1
    Yes, memory reservation policies are quite "generous" as the item count in the container increases. They are proportional, do not scale down as the item count increases. – dtech May 17 '17 at 18:35

1 Answers1

2

You've said in the comment that you are using another four ints as key - these values also have to be saved, so you are actually storing 8 ints, not 4. Apart from that, QHash has to store the value of the hash to efficiently lookup the values based on the key. The hash is an unsigned integer, so you've got 9 values, each 4 bytes long. It sums up to ~350 MB.

Also, internally QHash or QMap may use some padding between its elements, for example to satisfy data structure alignment requirements. Padding is a multiplier of 1 byte, which means that in case of 10 mln elements we may get at least several dozens of additional megabytes.

Besides, QHash and QMap are not just raw data - they both use additional pointers to their internal data structures etc., which is yet another reason why a single entry would take more space than you expected.

Another source of swollen data size might be the fact that for efficiency reasons, these classes may store some additional values so that they are precomputed when you call some of their methods.

Last but not least, QHash reserves more memory than its current elements need in any given moment for efficiency reasons (avoiding unnecessary copying). I would expect that the greater the size, the more memory it would reserve just in case, because copying gets more expensive. You can check the memory reserved in advance by calling the capacity() method. If you want to limit the amount of memory reserved, call the squeeze() method to tailor the memory so that it is just enough to contain the currently stored elements.

KjMag
  • 2,650
  • 16
  • 16
  • Instead of using _squeeze_ it's better to use _reserve_, because he knows the number of elements. – Zlatomir May 18 '17 at 11:55
  • If you call reserve() and then insert elements, the implementation is free to reserve more space during the process of insertion, so you might end up calling squeeze() in the end anyway just to be sure. – KjMag May 18 '17 at 12:50