i am benchmarking a JSON transformation library called JOLT. The library looks pretty good but we wanted to make sure that it did not stress our garbage collector too much. I therefore created a small program (reproduced below) that does the following:
- precompiles JSON tranformation scriptlet
- loop for a long time in each of ten threads, doing the following:
- create randomized JSON string input (approx 300 bytes)
- parse input using JOLT and apply tranform
When we look at our gargbage collector logs we never see a full GC (even after running for half a day), and even more suspicious, the only GC's that are reported to happen occur in the first couple of minutes of launching the program, then hours and hours pass and we don't see a single additional GC. The program is performing an impressive number of transactions per second.. and according to the JOLT documentation:
"The transform process will create and discard a lot of objects, so the garbage collector will have work to do."
So, I guess I should be happy that the our gc log looks like this even after an hours long run:
Server VM (25.66-b17) for linux-amd64 JRE (1.8.0_66-b17), built on Oct 6 2015 17:28:34 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 12439584k(11065340k free), swap 12721148k(12721148k free)
CommandLine flags: -XX:InitialHeapSize=199033344 -XX:MaxHeapSize=3184533504 -XX:MaxNewSize=348966912 -XX:MaxTenuringThreshold=6 -XX:OldPLABSize=16 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:SoftRefLRUPolicyMSPerMB=1 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
2017-01-10T11:13:39.752-0800: 1.458: [GC (Allocation Failure) 2017-01-10T11:13:39.753-0800: 1.459: [ParNew: 51904K->6464K(58368K), 0.0123836 secs] 51904K->8148K(188096K), 0.0138403 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2017-01-10T11:13:40.416-0800: 2.122: [GC (Allocation Failure) 2017-01-10T11:13:40.416-0800: 2.123: [ParNew: 58368K->6463K(58368K), 0.0193642 secs] 60052K->10934K(188096K), 0.0197870 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
2017-01-10T11:13:40.611-0800: 2.317: [GC (Allocation Failure) 2017-01-10T11:13:40.611-0800: 2.318: [ParNew: 58367K->4378K(58368K), 0.0098514 secs] 62838K->12472K(188096K), 0.0102622 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
2017-01-10T11:13:40.762-0800: 2.468: [GC (Allocation Failure) 2017-01-10T11:13:40.762-0800: 2.468: [ParNew: 56282K->6464K(58368K), 0.0067945 secs] 64376K->14714K(188096K), 0.0071342 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
2017-01-10T11:13:40.954-0800: 2.661: [GC (Allocation Failure) 2017-01-10T11:13:40.955-0800: 2.661: [ParNew: 58368K->6464K(58368K), 0.0102488 secs] 66618K->17490K(188096K), 0.0108123 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
2017-01-10T11:13:41.216-0800: 2.922: [GC (Allocation Failure) 2017-01-10T11:13:41.216-0800: 2.922: [ParNew: 58368K->6464K(58368K), 0.0110529 secs] 69394K->21205K(188096K), 0.0114348 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]
2017-01-10T11:13:41.412-0800: 3.119: [GC (Allocation Failure) 2017-01-10T11:13:41.413-0800: 3.119: [ParNew: 58368K->5049K(58368K), 0.0073977 secs] 73109K->22139K(188096K), 0.0077819 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
But maybe this is too good to be true? Perhaps I did not specify the correct JVM switches to catch >all< the GC's, and there really are some happening that are not logged ?
The program is continually running (and continually producing small objects) so it would seem that this would fill up eden space, and a memory allocation failure would occur, and a GC would be initiated as a result.. Even though I have (I think) a decent grasp of garbage collector concepts, I'm kind of puzzled. The only things I can think may be happening here are: 1) maybe JOLT allocates its small objects on the stack (like in a big array) instead of the heap? or 2) (more likely), maybe I did not specify my JVM switches correctly. Here is what I did w/ the switches:
export MAVEN_OPTS=" -XX:+UseConcMarkSweepGC \
-XX:SoftRefLRUPolicyMSPerMB=1 \
-XX:+PrintGCTimeStamps \
-XX:+PrintGCDateStamps \
-XX:+PrintGCDetails \
-Xloggc:/tmp/gc.log -Xmx2048m "
I then ran the program as a unit test like this:
mvn clean test
The source code for the test is given below.
(if desired i can post the full maven project for this thing on github.) Thanks in advance for the help !
import com.bazaarvoice.jolt.Chainr;
import com.bazaarvoice.jolt.JsonUtils;
import org.apache.commons.io.IOUtils;
import org.apache.commons.math.stat.descriptive.SummaryStatistics;
import org.junit.BeforeClass;
import org.junit.Test;
import org.junit.jupiter.api.DisplayName;
import java.util.Collections;
import java.util.List;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicLong;
public class BenchmarkJoltTest {
static int numThreads = 20;
static int numSecondsToRun = 360 * 1000; // 10 hours
static boolean randomize = true; /* for non-random we will validate that the output always remains constant */
static int hashOfExpectedOutput = 0;
static SummaryStatistics summaryStatistics = new SummaryStatistics();
static String inputJson = "";
static String transformSpecJson = "";
static Random random = new Random();
static Chainr chainr = null;
static AtomicLong completedTransformsCountForInterval = new AtomicLong(0);
static AtomicLong completedTransformsCountGrandTotal = new AtomicLong(0);
//summaryStatistics
@BeforeClass
public static void initialize() throws Exception {
System.out.println("HI HI HO");
inputJson =
String.join( " ", IOUtils.readLines(ClassLoader.getSystemResourceAsStream("input.json")));
transformSpecJson =
String.join( " ", IOUtils.readLines(ClassLoader.getSystemResourceAsStream("spec.json")));
chainr = Chainr.fromSpec(JsonUtils.jsonToList( transformSpecJson ));
/* Warm up: do JIT compilation, load classes and whatever else might need to be done only during startup */
for (int i = 0; i < 100; i++) {
runOneTransform(false);
}
}
@Test
@DisplayName("Benchmark for Jolt Json transforms")
public void runBenchmark() throws Exception {
runTransformsInThreadPool(10 /* threads */);
System.out.println("OK - here");
for (int i = 0; i < numSecondsToRun; i++) {
Thread.sleep(1000); /* 1 second */
double value = completedTransformsCountForInterval.doubleValue();
System.out.println("adding to count: " + value );
summaryStatistics.addValue(value);
completedTransformsCountForInterval.set(0L); // reset counter
if (completedTransformsCountForInterval.get() > 100) {
throw new Exception("how did the counter suddenly jump so high? -- unexpected");
}
}
System.out.println(
"Summary statistics for number of transforms performed per second\n\n" +
summaryStatistics.toString() + "\ntotalCount: "+ completedTransformsCountForInterval);
System.out.println("Grand total of completed operations: " + completedTransformsCountGrandTotal);
System.exit(0);
}
private void runTransformsInThreadPool(final int threadCount) throws InterruptedException, ExecutionException {
Callable<Void> task = new Callable<Void>() {
@Override
public Void call() throws Exception {
while (true) {
runOneTransform(true /* include in official count */);
}
}
};
List<Callable<Void>> tasks = Collections.nCopies(threadCount, task);
ExecutorService executorService = Executors.newFixedThreadPool(threadCount);
for (Callable c : tasks) {
System.out.println("submitting " + c);
executorService.submit(c);
System.out.println("finished submitting ");
}
System.out.println("INVOKED all");
}
private static void runOneTransform(boolean includeInOfficialCount /* don't include if in warm-up mode */) throws Exception {
String targetJson = randomize ? getRandomizedJson() : inputJson;
Object transformedJson = chainr.transform(JsonUtils.jsonToObject(targetJson));
String transformedJsonAsString = JsonUtils.toJsonString(transformedJson);
if (hashOfExpectedOutput == 0) {
hashOfExpectedOutput = transformedJsonAsString.hashCode();
}
if (! randomize && hashOfExpectedOutput != transformedJsonAsString.hashCode()) {
throw new Exception("hash codes of transformed strings don't match. multi-threading issue ?");
}
if (includeInOfficialCount) {
completedTransformsCountForInterval.incrementAndGet();
completedTransformsCountGrandTotal.incrementAndGet();
} else {
System.out.println(transformedJsonAsString);
}
}
private static String getRandomizedJson() {
Integer num = random.nextInt() % 10;
inputJson.replaceAll("3", num.toString());
return inputJson;
}
}