2

We're using ChronicleMap to support off-heap persistence in a large number of different stores, but hit a bit a of a problem with the most simple usecase.

First of all, here's the helper I wrote to make creation easier:

import java.io.File
import java.util.concurrent.atomic.AtomicLong

import com.madhukaraphatak.sizeof.SizeEstimator
import net.openhft.chronicle.map.{ChronicleMap, ChronicleMapBuilder}

import scala.reflect.ClassTag

object ChronicleHelper {

  def estimateSizes[Key, Value](data: Iterator[(Key, Value)], keyEstimator: AnyRef => Long = defaultEstimator, valueEstimator: AnyRef => Long = defaultEstimator): (Long, Long, Long) = {
    println("Estimating sizes...")

    val entries = new AtomicLong(1)
    val keySum = new AtomicLong(1)
    val valueSum = new AtomicLong(1)
    var i = 0

    val GroupSize = 5000

    data.grouped(GroupSize).foreach { chunk =>

      chunk.par.foreach { case (key, value) =>
        entries.incrementAndGet()
        keySum.addAndGet(keyEstimator(key.asInstanceOf[AnyRef]))
        valueSum.addAndGet(valueEstimator(value.asInstanceOf[AnyRef]))
      }

      i += 1

      println("Progress:" + i * GroupSize)
    }

    (entries.get(), keySum.get() / entries.get(), valueSum.get() / entries.get())
  }

  def defaultEstimator(v: AnyRef): Long = SizeEstimator.estimate(v)

  def createMap[Key: ClassTag, Value: ClassTag](data: => Iterator[(Key, Value)], file: File): ChronicleMap[Key, Value] = {
    val keyClass = implicitly[ClassTag[Key]].runtimeClass.asInstanceOf[Class[Key]]
    val valueClass = implicitly[ClassTag[Value]].runtimeClass.asInstanceOf[Class[Value]]

    val (entries, averageKeySize, averageValueSize) = estimateSizes(data)

    val builder = ChronicleMapBuilder.of(keyClass, valueClass)
      .entries(entries)
      .averageKeySize(averageKeySize)
      .averageValueSize(averageValueSize)
      .asInstanceOf[ChronicleMapBuilder[Key, Value]]

    val cmap = builder.createPersistedTo(file)

    val GroupSize = 5000

    println("Inserting data...")
    var i = 0
    data.grouped(GroupSize).foreach { chunk =>

      chunk.par.foreach { case (key, value) =>
        cmap.put(key, value)
      }

      i += 1

      println("Progress:" + i * GroupSize)
    }

    cmap
  }

  def empty[Key: ClassTag, Value: ClassTag]: ChronicleMap[Key, Value] = {
    val keyClass = implicitly[ClassTag[Key]].runtimeClass.asInstanceOf[Class[Key]]
    val valueClass = implicitly[ClassTag[Value]].runtimeClass.asInstanceOf[Class[Value]]


    ChronicleMapBuilder.of(keyClass, valueClass).create()
  }


  def loadMap[Key: ClassTag, Value: ClassTag](file: File): ChronicleMap[Key, Value] = {
    val keyClass = implicitly[ClassTag[Key]].runtimeClass.asInstanceOf[Class[Key]]
    val valueClass = implicitly[ClassTag[Value]].runtimeClass.asInstanceOf[Class[Value]]

    ChronicleMapBuilder.of(keyClass, valueClass).createPersistedTo(file)
  }
}

It uses https://github.com/phatak-dev/java-sizeof for object size estimation. Here's the kind of usage we want to support:

object TestChronicle {
  def main(args: Array[String]) {
    def dataIterator: Iterator[(String, Int)] = (1 to 5000).toIterator.zipWithIndex.map(x => x.copy(_1 = x._1.toString))

    ChronicleHelper.createMap[String, Int](dataIterator, new File("/tmp/test.map"))

  }
}

But it throws an exception:

[error] Exception in thread "main" java.lang.ClassCastException: Key must be a int but was a class java.lang.Integer [error] at net.openhft.chronicle.hash.impl.VanillaChronicleHash.checkKey(VanillaChronicleHash.java:661) [error] at net.openhft.chronicle.map.VanillaChronicleMap.queryContext(VanillaChronicleMap.java:281) [error] at net.openhft.chronicle.map.VanillaChronicleMap.put(VanillaChronicleMap.java:390) [error] at ...

I can see that it might have something to do with atomicity of Scala's Int as opposed to Java's Integer, but how do I bypass that?

Scala 2.11.7

Chronicle Map 3.8.0

leventov
  • 14,760
  • 11
  • 69
  • 98
Anton
  • 3,006
  • 3
  • 26
  • 37

1 Answers1

1
  • Seems suspicious that in your test it's Iterator[(String, Int)] (rather than Iterator[(Int, String)]) for key type is String and value type is Int, while the error message is compaining about key's type (int/Integer)
  • If error message says Key must be a %type% it means that you configured that type in the first ChronicleMapBuilder.of(keyType, valueType) statement. So in your case it means that you configured int.class (the Class object, representing the primitive int type in Java), that is not allowed, and providing java.lang.Integer instance to map's methods (probably you provide primitive ints, but they become Integer due to boxing), that is allowed. You should ensure that you are providing java.lang.Integer.class (or some other Scala's class) to ChronicleMapBuilder.of(keyType, valueType) call.
  • I don't know what size estimation this project gives: https://github.com/phatak-dev/java-sizeof, but in any case you should specify size in bytes that the object will take in serialized form. Serialized form itself depends on default serializers, chosen for a specific type in Chronicle Map (and may change between Chronicle Map versions), or custom serializers configured for specific ChronicleMapBuilder. So using any information about key/value "sizes" to configure a Chronicle Map, other than out of the Chronicle Map itself, is fragile. You can use the following procedure to estimate sizes more reliably:

    public static <V> double averageValueSize(Class<V> valueClass, Iterable<V> values) {
        try (ChronicleMap<Integer, V> testMap = ChronicleMap.of(Integer.class, valueClass)
            // doesn't matter, anyway not a single value will be written to a map
                .averageValueSize(1)
                .entries(1)
                .create()) {
            LongSummaryStatistics statistics = new LongSummaryStatistics();
            for (V value : values) {
                try (MapSegmentContext<Integer, V, ?> c = testMap.segmentContext(0)) {
                    statistics.accept(c.wrapValueAsData(value).size());
                }
            }
            return statistics.getAverage();
        }
    }
    

    You can find it in this test: https://github.com/OpenHFT/Chronicle-Map/blob/7aedfba7a814578a023f7975ef15ba88b4d435db/src/test/java/eg/AverageValueSizeTest.java

    This procedure is hackish, but there are no better options right now.

Another recommendation:

  • If your keys or values are kind of primitives (ints, longs, doubles, but boxed), or any other type that is always of the same size, you shouldn't use averageKey/averageValue/averageKeySize/averageValueSize methods, better you use constantKeySizeBySample/constantValueSizeBySample method. Specifically for java.lang.Integer, Long and Double even this is not needed, Chronicle Map already knows that those types are constantly sized.
leventov
  • 14,760
  • 11
  • 69
  • 98