0

I have a code that uses Roaring64NavigableMap as a neo4j plugin the long value of nodes using getId() from the Neo4J API.

I would like to use Chronicle-Map. I see this example:

ChronicleSet<UUID> uuids =
    ChronicleSet.of(Long.class)
        .name("ids")
        .entries(1_000_000)
        .create();
  1. What if I don't know how many values to anticipate? does .entries(1_000_000) limit the cache or the DB max number of enteries
  2. Is there a way to handle really big amount of data around a billion entries?
  3. Is there a more efficient way to create Chronicle-Map?
  4. Can I control the size of the cache it's uses?
  5. Can I control the volume where the DB is stored in?
0x90
  • 39,472
  • 36
  • 165
  • 245

1 Answers1

1

What if I don't know how many values to anticipate? does .entries(1_000_000) limit the cache or the DB max number of entries

From the Javadoc of entries() method:

Configures the target number of entries, that is going be inserted into the hash containers, created by this builder. If ChronicleHashBuilder.maxBloatFactor(double) is configured to 1.0 (and this is by default), this number of entries is also the maximum. If you try to insert more entries, than the configured maxBloatFactor, multiplied by the given number of entries, IllegalStateException might be thrown.

This configuration should represent the expected maximum number of entries in a stable state, maxBloatFactor - the maximum bloat up coefficient, during exceptional bursts.

To be more precise - try to configure the entries so, that the created hash container is going to serve about 99% requests being less or equal than this number of entries in size.

You shouldn't put additional margin over the actual target number of entries. This bad practice was popularized by HashMap.HashMap(int) and HashSet.HashSet(int) constructors, which accept capacity, that should be multiplied by load factor to obtain the actual maximum expected number of entries. ChronicleMap and ChronicleSet don't have a notion of load factor.

So this is kind of the maximum number of entries unless you specify maxBloatFactor(2.0) (or 10.0, etc.). Currently, Chronicle Map doesn't support the case "I really don't know how many entries will I have; maybe 1; maybe 1 billion; but I want to create a Map that will grow organically to the required size". This is a known limitation.

Is there a way to handle really big amount of data around a billion entries?

Yes, if you have sufficient amount of memory. Although memory-mapped, Chronicle Map is not designed to work efficiently when the amount of data is significantly larger than the memory. Use LMDB, or RocksDB, or something similar in that case.

leventov
  • 14,760
  • 11
  • 69
  • 98
  • What about Redis or BoltDB? – 0x90 Jun 06 '19 at 17:28
  • Redis is not supposed to work when the amount of data is greater than the volume of available memory as well as Chronicle Map. BoltDB - perhaps, but I don't know if it can be used from Java since it's written in Go. If you prefer B-Tree architecture (it's best when there are a lot of reads and less writes, and there is only a single writer) take a look at https://github.com/lmdbjava/lmdbjava – leventov Jun 08 '19 at 15:43
  • @leventov If the no. of entries are more than that can fit in the memory, does it not flush some pages to disk? – JavaTechnical Aug 28 '19 at 09:57
  • @JavaTechnical Chronicle Map relies on native memory mapping facility in the OS. This may lead to frequent page reads and writes. More frequent than with Tree-like storages like LMDB/RocksDB. – leventov Aug 28 '19 at 17:34
  • @leventov If it relies on memory mapping facility of the OS, does not the OS do a flush when it needs to load other pages and when it is short of memory for some other process? – JavaTechnical Aug 29 '19 at 04:04
  • @JavaTechnical It does – leventov Aug 29 '19 at 07:23
  • @leventov Then your last statement, that chronicle map is not supposed to work when amount of data is significantly larger than memory is false, right? Because, *a part of the map* (from the OS perspective, a page) may be swapped out and when you try to access the map data belonging to that swapped-out page, you should get it (since the OS swaps it in again) and all this is transparent to the application. Isn't it? – JavaTechnical Aug 30 '19 at 05:16
  • My statement is not false. What you described will make Chronicle Map *work* (that is, it shouldn't just crash or something like that), but it will be *inefficient*, because OS may swap pages in/out on every access to the Map, while Tree storages tend to batch accesses. – leventov Aug 30 '19 at 16:10
  • By "not supposed to work" I meant "Chronicle Map wasn't designed to be used like that", not "it won't work at all". – leventov Aug 30 '19 at 16:11
  • Updated answer to "Although memory-mapped, Chronicle Map is not designed to work efficiently when the amount of data is significantly larger than the memory." – leventov Aug 30 '19 at 16:12