0

i am benchmarking a JSON transformation library called JOLT. The library looks pretty good but we wanted to make sure that it did not stress our garbage collector too much. I therefore created a small program (reproduced below) that does the following:

- precompiles JSON tranformation scriptlet
- loop for a long time in each of ten threads, doing the following:
    - create randomized JSON string input (approx 300 bytes)
    - parse input using JOLT and apply tranform

When we look at our gargbage collector logs we never see a full GC (even after running for half a day), and even more suspicious, the only GC's that are reported to happen occur in the first couple of minutes of launching the program, then hours and hours pass and we don't see a single additional GC. The program is performing an impressive number of transactions per second.. and according to the JOLT documentation:

"The transform process will create and 
  discard a lot of objects, so the garbage collector 
  will have work to do."

So, I guess I should be happy that the our gc log looks like this even after an hours long run:

Server VM (25.66-b17) for linux-amd64 JRE (1.8.0_66-b17), built on Oct  6 2015 17:28:34 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 12439584k(11065340k free), swap 12721148k(12721148k free)
CommandLine flags: -XX:InitialHeapSize=199033344 -XX:MaxHeapSize=3184533504 -XX:MaxNewSize=348966912 -XX:MaxTenuringThreshold=6 -XX:OldPLABSize=16 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:SoftRefLRUPolicyMSPerMB=1 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
2017-01-10T11:13:39.752-0800: 1.458: [GC (Allocation Failure) 2017-01-10T11:13:39.753-0800: 1.459: [ParNew: 51904K->6464K(58368K), 0.0123836 secs] 51904K->8148K(188096K), 0.0138403 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
2017-01-10T11:13:40.416-0800: 2.122: [GC (Allocation Failure) 2017-01-10T11:13:40.416-0800: 2.123: [ParNew: 58368K->6463K(58368K), 0.0193642 secs] 60052K->10934K(188096K), 0.0197870 secs] [Times: user=0.03 sys=0.00, real=0.02 secs] 
2017-01-10T11:13:40.611-0800: 2.317: [GC (Allocation Failure) 2017-01-10T11:13:40.611-0800: 2.318: [ParNew: 58367K->4378K(58368K), 0.0098514 secs] 62838K->12472K(188096K), 0.0102622 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 
2017-01-10T11:13:40.762-0800: 2.468: [GC (Allocation Failure) 2017-01-10T11:13:40.762-0800: 2.468: [ParNew: 56282K->6464K(58368K), 0.0067945 secs] 64376K->14714K(188096K), 0.0071342 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 
2017-01-10T11:13:40.954-0800: 2.661: [GC (Allocation Failure) 2017-01-10T11:13:40.955-0800: 2.661: [ParNew: 58368K->6464K(58368K), 0.0102488 secs] 66618K->17490K(188096K), 0.0108123 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 
2017-01-10T11:13:41.216-0800: 2.922: [GC (Allocation Failure) 2017-01-10T11:13:41.216-0800: 2.922: [ParNew: 58368K->6464K(58368K), 0.0110529 secs] 69394K->21205K(188096K), 0.0114348 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 
2017-01-10T11:13:41.412-0800: 3.119: [GC (Allocation Failure) 2017-01-10T11:13:41.413-0800: 3.119: [ParNew: 58368K->5049K(58368K), 0.0073977 secs] 73109K->22139K(188096K), 0.0077819 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 

But maybe this is too good to be true? Perhaps I did not specify the correct JVM switches to catch >all< the GC's, and there really are some happening that are not logged ?

The program is continually running (and continually producing small objects) so it would seem that this would fill up eden space, and a memory allocation failure would occur, and a GC would be initiated as a result.. Even though I have (I think) a decent grasp of garbage collector concepts, I'm kind of puzzled. The only things I can think may be happening here are: 1) maybe JOLT allocates its small objects on the stack (like in a big array) instead of the heap? or 2) (more likely), maybe I did not specify my JVM switches correctly. Here is what I did w/ the switches:

export MAVEN_OPTS=" -XX:+UseConcMarkSweepGC \
-XX:SoftRefLRUPolicyMSPerMB=1 \
-XX:+PrintGCTimeStamps \
-XX:+PrintGCDateStamps \
-XX:+PrintGCDetails \
-Xloggc:/tmp/gc.log  -Xmx2048m "

I then ran the program as a unit test like this:

mvn clean test

The source code for the test is given below.

(if desired i can post the full maven project for this thing on github.) Thanks in advance for the help !

import com.bazaarvoice.jolt.Chainr;
import com.bazaarvoice.jolt.JsonUtils;
import org.apache.commons.io.IOUtils;
import org.apache.commons.math.stat.descriptive.SummaryStatistics;
import org.junit.BeforeClass;
import org.junit.Test;
import org.junit.jupiter.api.DisplayName;

import java.util.Collections;
import java.util.List;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicLong;

public class BenchmarkJoltTest {
    static int numThreads = 20;
    static int numSecondsToRun = 360 * 1000; // 10 hours
    static boolean randomize = true;    /* for non-random we will validate that the output always remains constant */

    static int hashOfExpectedOutput = 0;
    static SummaryStatistics summaryStatistics  = new SummaryStatistics();

    static String inputJson = "";
    static String transformSpecJson  = "";
    static Random random = new Random();
    static Chainr chainr = null;
    static AtomicLong completedTransformsCountForInterval = new AtomicLong(0);
    static AtomicLong completedTransformsCountGrandTotal = new AtomicLong(0);

    //summaryStatistics

    @BeforeClass
    public static void initialize() throws Exception {
        System.out.println("HI HI HO");
        inputJson =
                String.join( " ", IOUtils.readLines(ClassLoader.getSystemResourceAsStream("input.json")));
        transformSpecJson =
                String.join( " ", IOUtils.readLines(ClassLoader.getSystemResourceAsStream("spec.json")));
        chainr = Chainr.fromSpec(JsonUtils.jsonToList( transformSpecJson ));

        /* Warm up: do JIT compilation, load classes and whatever else might need to be done only during startup */
        for (int i = 0; i < 100; i++) {
            runOneTransform(false);
        }
    }

    @Test
    @DisplayName("Benchmark for Jolt Json transforms")
    public void runBenchmark() throws Exception {
        runTransformsInThreadPool(10 /* threads */);
        System.out.println("OK - here");

        for (int i = 0; i < numSecondsToRun; i++) {
            Thread.sleep(1000);  /* 1 second */
            double value = completedTransformsCountForInterval.doubleValue();
            System.out.println("adding to count: " + value );
            summaryStatistics.addValue(value);
            completedTransformsCountForInterval.set(0L);           // reset counter
            if (completedTransformsCountForInterval.get() > 100) {
                throw new Exception("how did the counter suddenly jump so high? -- unexpected");
            }
        }

        System.out.println(
                "Summary statistics for number of transforms performed per second\n\n" +
                        summaryStatistics.toString() + "\ntotalCount: "+ completedTransformsCountForInterval);
        System.out.println("Grand total of completed operations: " + completedTransformsCountGrandTotal);
        System.exit(0);
    }

    private void runTransformsInThreadPool(final int threadCount) throws InterruptedException, ExecutionException {
        Callable<Void> task = new Callable<Void>() {
            @Override
            public Void call() throws Exception {
                while (true) {
                    runOneTransform(true /* include in official count */);
                }
            }
        };
        List<Callable<Void>> tasks = Collections.nCopies(threadCount, task);
        ExecutorService executorService = Executors.newFixedThreadPool(threadCount);

        for (Callable c : tasks) {
            System.out.println("submitting " + c);
            executorService.submit(c);
            System.out.println("finished submitting ");
        }

        System.out.println("INVOKED all");
    }

    private static void runOneTransform(boolean includeInOfficialCount /* don't include if in warm-up mode */) throws Exception {
        String targetJson = randomize ?  getRandomizedJson() :  inputJson;
        Object transformedJson = chainr.transform(JsonUtils.jsonToObject(targetJson));
        String transformedJsonAsString = JsonUtils.toJsonString(transformedJson);
        if (hashOfExpectedOutput == 0) {
            hashOfExpectedOutput  = transformedJsonAsString.hashCode();
        }
        if (! randomize && hashOfExpectedOutput != transformedJsonAsString.hashCode()) {
            throw new Exception("hash codes of transformed strings don't match. multi-threading issue ?");
        }

        if (includeInOfficialCount) {
            completedTransformsCountForInterval.incrementAndGet();
            completedTransformsCountGrandTotal.incrementAndGet();
        } else {
            System.out.println(transformedJsonAsString);
        }
    }

    private static String getRandomizedJson() {
        Integer num =  random.nextInt() % 10;
        inputJson.replaceAll("3", num.toString());
        return inputJson;
    }
}
Chris Bedford
  • 2,560
  • 3
  • 28
  • 60
  • Don’t forget that modern JVMs have Escape Analysis. So if the operation “will create and discard a lot of objects”, this implies lots of purely local objects whose life time can be predicted without subsequent garbage collection or which might even get scalarized, once the optimizer kicked in. You may compare with a run with the `-XX:-DoEscapeAnalysis` option… – Holger Jan 11 '17 at 11:49
  • -XX:-DoEscapeAnalysis ... that's a good one. will try. thank you! – Chris Bedford Jan 12 '17 at 19:00

3 Answers3

1

It looks like runOneTransform is only creating very shortlived objects. Since the cost of tracing GCs depends on the size of the live object set and those objects don't live long it means the GC has almost nothing to do.

It's quite possible that in this artificial test-case you're mostly measuring the cost of the actual code executing and of allocations, not of garbage collections.

Real-world applications often have a less left-skewed lifetime histogram, requiring more tracing and copying work from the collectors.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • good call. thanks.. i am going to post the answer that i got from the JOLT developers themselves and accept that. but i appreciate the pointer ;^). thnx – Chris Bedford Jan 12 '17 at 19:14
0

The logs indicate that the garbage collector >>is<< running. Each one of those lines is a run of the ParNew collector. However, if I am interpreting the log correctly, they are all collections of the "new object" space, and that is pretty fast.

If your application continued to run like this, it would eventually get to the point where the CMS collector needed to run to collect garbage that have been promoted out of the new space.


The fact that your application does not appear to be generating much load is not necessarily suspicious. The stuff you found in the Jolt documentation could be out of date or inaccurate. (For instance, they may have been generalizing incorrectly.)

If you suspect that your application is not working correctly, look for another way to validate the results.

Maybe I did not specify my JVM switches correctly.

A low GC rate is a >>good<< thing. You should only look at tweaking the switches if there is a problem.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Thanks, Stephen. bad me ! my title said "is not running" should have been "is running far more infrequently than expected". And yes.. the logs (and also looking in gc-viewer reveal no full GC's were ever done .. even after 12 hours of running. I will correct the title. And yes.. you are correct other ways of validating the program are indeed appropriate.. you will note the benchmark does xformations/second. I got great numbers this way (~ 200k xforms/sec) – Chris Bedford Jan 11 '17 at 05:44
0

Thanks to Stephen C, @the8472, @Holger for the tips. I am going to summarize the answer that I got from the Jolt developers (specifically, Milo Simpson), and post a link to the details.

In summary: the explanation Milo gave was that EscapeAnalysis was kicking in and that the objects Jolt is creating to do the transforms are being allocated on the stack. If I wanted to force GC's (which I'd rather not do in production ;^) he recommended setting up the test so that references to the output JSON cross thread boundaries.

I don't think I'll do that because the point of our exercise was to assess Jolt's performance... and it seems to be very solid.

I learned a lot from this exercise and recommend reading Milo's answer in full. https://github.com/bazaarvoice/jolt/issues/305#issuecomment-271861868

Chris Bedford
  • 2,560
  • 3
  • 28
  • 60