1

I am using Blazegraph's 1.5.3 version of their Bigdata DB (now rebranded as Blazegraph). I have a service that acts as a Gateway, implementing a bunch of persistence-layer methods. Now I'm writing unit tests for those methods. I'm using the embedded setup with Jetty. My setup code is below:

    int port = 0; // random port
    String namespace = "kb";
    int queryThreadPoolSize = ConfigParams.DEFAULT_QUERY_THREAD_POOL_SIZE;
    boolean forceOverflow = false;

    String servletContextListenerClass = ConfigParams.DEFAULT_SERVLET_CONTEXT_LISTENER_CLASS;
    System.setProperty(SystemProperties.JETTY_XML, "jetty.xml");
    String propertyFile = "RWStore.properties";
    System.setProperty(SystemProperties.BIGDATA_PROPERTY_FILE, propertyFile);

    final Map<String, String> initParams = new LinkedHashMap<>();
    initParams.put("propertyFile", propertyFile);
    initParams.put("namespace", namespace);
    initParams.put("queryThreadPoolSize", Integer.toString(queryThreadPoolSize));
    initParams.put("forceOverflow", Boolean.toString(forceOverflow));
    initParams.put("servletContextListenerClass", servletContextListenerClass);

    sparqlServer = NanoSparqlServer.newInstance(port, journal, initParams);

    LOGGER.info("Waiting for NanoSparqlServer to start...");
    NanoSparqlServer.awaitServerStart(sparqlServer);
    serverUrl = sparqlServer.getURI().toString();
    LOGGER.info("NanoSparqlServer started on: " + serverUrl + '\n');

I am using the default jetty.xml configuration from the com.bigdata 1.5.3 jar:

<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN" "http://www.eclipse.org/jetty/configure.dtd">
<!-- See http://www.eclipse.org/jetty/documentation/current/ -->
<!-- See http://wiki.eclipse.org/Jetty/Reference/jetty.xml_syntax -->
<Configure id="Server" class="org.eclipse.jetty.server.Server">

    <!-- =========================================================== -->
    <!-- Configure the Server Thread Pool.                           -->
    <!-- The server holds a common thread pool which is used by      -->
    <!-- default as the executor used by all connectors and servlet  -->
    <!-- dispatches.                                                 -->
    <!--                                                             -->
    <!-- Configuring a fixed thread pool is vital to controlling the -->
    <!-- maximal memory footprint of the server and is a key tuning  -->
    <!-- parameter for tuning.  In an application that rarely blocks -->
    <!-- then maximal threads may be close to the number of 5*CPUs.  -->
    <!-- In an application that frequently blocks, then maximal      -->
    <!-- threads should be set as high as possible given the memory  -->
    <!-- available.                                                  -->
    <!--                                                             -->
    <!-- Consult the javadoc of o.e.j.util.thread.QueuedThreadPool   -->
    <!-- for all configuration that may be set here.                 -->
    <!-- =========================================================== -->
    <Arg name="threadpool"><New id="threadpool" class="org.eclipse.jetty.util.thread.QueuedThreadPool"/></Arg>
    <Get name="ThreadPool">
        <Set name="minThreads" type="int"><Property name="jetty.threads.min" default="10"/></Set>
        <Set name="maxThreads" type="int"><Property name="jetty.threads.max" default="64"/></Set>
        <Set name="idleTimeout" type="int"><Property name="jetty.threads.timeout" default="60000"/></Set>
        <Set name="detailedDump">false</Set>
    </Get>

    <!-- =========================================================== -->
    <!-- Get the platform mbean server                               -->
    <!-- =========================================================== -->
    <Call id="MBeanServer" class="java.lang.management.ManagementFactory"
          name="getPlatformMBeanServer" />

    <!-- =========================================================== -->
    <!-- Initialize the Jetty MBean container                        -->
    <!-- =========================================================== -->
    <!-- Note: This breaks CI if it is enabled
    <Call name="addBean">
      <Arg>
        <New id="MBeanContainer" class="org.eclipse.jetty.jmx.MBeanContainer">
          <Arg>
            <Ref refid="MBeanServer" />
          </Arg>
        </New>
      </Arg>
    </Call>-->

    <!-- Add the static log to the MBean server.
    <Call name="addBean">
      <Arg>
        <New class="org.eclipse.jetty.util.log.Log" />
      </Arg>
    </Call>-->

    <!-- For remote MBean access (optional)
    <New id="ConnectorServer" class="org.eclipse.jetty.jmx.ConnectorServer">
      <Arg>
        <New class="javax.management.remote.JMXServiceURL">
          <Arg type="java.lang.String">rmi</Arg>
          <Arg type="java.lang.String" />
          <Arg type="java.lang.Integer"><SystemProperty name="jetty.jmxrmiport" default="1090"/></Arg>
          <Arg type="java.lang.String">/jndi/rmi://<SystemProperty name="jetty.jmxrmihost" default="localhost"/>:<SystemProperty name="jetty.jmxrmiport" default="1099"/>/jmxrmi</Arg>
        </New>
      </Arg>
      <Arg>org.eclipse.jetty.jmx:name=rmiconnectorserver</Arg>
      <Call name="start" />
    </New>-->

    <!-- =========================================================== -->
    <!-- Http Configuration.                                         -->
    <!-- This is a common configuration instance used by all         -->
    <!-- connectors that can carry HTTP semantics (HTTP, HTTPS, SPDY)-->
    <!-- It configures the non wire protocol aspects of the HTTP     -->
    <!-- semantic.                                                   -->
    <!--                                                             -->
    <!-- Consult the javadoc of o.e.j.server.HttpConfiguration       -->
    <!-- for all configuration that may be set here.                 -->
    <!-- =========================================================== -->
    <New id="httpConfig" class="org.eclipse.jetty.server.HttpConfiguration">
        <Set name="secureScheme">https</Set>
        <Set name="securePort"><Property name="jetty.secure.port" default="8443" /></Set>
        <Set name="outputBufferSize"><Property name="jetty.output.buffer.size" default="32768" /></Set>
        <Set name="requestHeaderSize"><Property name="jetty.request.header.size" default="8192" /></Set>
        <Set name="responseHeaderSize"><Property name="jetty.response.header.size" default="8192" /></Set>
        <Set name="sendServerVersion"><Property name="jetty.send.server.version" default="true" /></Set>
        <Set name="sendDateHeader"><Property name="jetty.send.date.header" default="false" /></Set>
        <Set name="headerCacheSize">512</Set>
        <!-- Uncomment to enable handling of X-Forwarded- style headers
        <Call name="addCustomizer">
          <Arg><New class="org.eclipse.jetty.server.ForwardedRequestCustomizer"/></Arg>
        </Call>
        -->
    </New>

    <!-- Configure the HTTP endpoint.                                -->
    <Call name="addConnector">
        <Arg>
            <New class="org.eclipse.jetty.server.ServerConnector">
                <Arg name="server"><Ref refid="Server" /></Arg>
                <Arg name="factories">
                    <Array type="org.eclipse.jetty.server.ConnectionFactory">
                        <Item>
                            <New class="org.eclipse.jetty.server.HttpConnectionFactory">
                                <Arg name="config"><Ref refid="httpConfig" /></Arg>
                            </New>
                        </Item>
                    </Array>
                </Arg>
                <Set name="host"><SystemProperty name="jetty.host" /></Set>
                <Set name="port"><SystemProperty name="jetty.port" default="9999" /></Set>
                <Set name="idleTimeout"><SystemProperty name="http.timeout" default="30000"/></Set>
            </New>
        </Arg>
    </Call>

    <!-- =========================================================== -->
    <!-- Set handler Collection Structure                            -->
    <!-- =========================================================== -->
    <Set name="handler">
        <New id="Handlers" class="org.eclipse.jetty.server.handler.HandlerCollection">
            <Set name="handlers">
                <Array type="org.eclipse.jetty.server.Handler">
                    <Item>
                        <New id="Contexts" class="org.eclipse.jetty.server.handler.ContextHandlerCollection">
                            <Call name="addHandler">
                                <Arg>
                                    <!-- This is the redirect from root to /bigdata -->
                                    <New id="moved" class="org.eclipse.jetty.server.handler.MovedContextHandler">
                                        <Set name="contextPath">/</Set>
                                        <Set name="newContextURL">/bigdata</Set>
                                        <Set name="permanent">true</Set>
                                        <Set name="discardPathInfo">false</Set>
                                        <Set name="discardQuery">false</Set>
                                    </New>
                                </Arg>
                            </Call>
                            <Call name="addHandler">
                                <Arg>
                                    <!-- This is the bigdata web application. -->
                                    <New id="WebAppContext" class="org.eclipse.jetty.webapp.WebAppContext">
                                        <Set name="war"><SystemProperty name="jetty.resourceBase" default="bigdata-war/src"/></Set>
                                        <Set name="contextPath">/bigdata</Set>
                                        <Set name="descriptor">WEB-INF/web.xml</Set>
                                        <Set name="parentLoaderPriority">true</Set>
                                        <Set name="extractWAR">false</Set>
                                        <Set name="overrideDescriptor"><SystemProperty name="jetty.overrideWebXml" default="bigdata-war/src/WEB-INF/override-web.xml"/></Set>
                                        <Set name="maxFormContentSize">10485760</Set>
                                    </New>
                                </Arg>
                            </Call>
                        </New>
                    </Item>
                </Array>
            </Set>
        </New>
    </Set>

    <!-- =========================================================== -->
    <!-- extra server options                                        -->
    <!-- =========================================================== -->
    <Set name="stopAtShutdown">true</Set>
    <Set name="stopTimeout">5000</Set>
    <Set name="dumpAfterStart"><Property name="jetty.dump.start" default="false"/></Set>
    <Set name="dumpBeforeStop"><Property name="jetty.dump.stop" default="false"/></Set>

</Configure>

... and I am using the default RWStore.properties from the same jar:

#
# Note: These options are applied when the journal and the triple store are
# first created.

##
## Journal options.
##

# The backing file. This contains all your data.  You want to put this someplace
# safe.  The default locator will wind up in the directory from which you start
# your servlet container.
com.bigdata.journal.AbstractJournal.createTempFile=true

# The persistence engine.  Use 'Disk' for the WORM or 'DiskRW' for the RWStore.
com.bigdata.journal.AbstractJournal.bufferMode=DiskRW

# Setup for the RWStore recycler rather than session protection.
com.bigdata.service.AbstractTransactionService.minReleaseAge=1

# Enable group commit. See http://wiki.blazegraph.com/wiki/index.php/GroupCommit and BLZG-192.
#com.bigdata.journal.Journal.groupCommit=false

com.bigdata.btree.writeRetentionQueue.capacity=4000
com.bigdata.btree.BTree.branchingFactor=128

# 200M initial extent.
com.bigdata.journal.AbstractJournal.initialExtent=209715200
com.bigdata.journal.AbstractJournal.maximumExtent=209715200

##
## Setup for QUADS mode without the full text index.
##
com.bigdata.rdf.sail.truthMaintenance=false
com.bigdata.rdf.store.AbstractTripleStore.quads=true
com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
com.bigdata.rdf.store.AbstractTripleStore.textIndex=false
com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms

# Bump up the branching factor for the lexicon indices on the default kb.
com.bigdata.namespace.kb.lex.com.bigdata.btree.BTree.branchingFactor=400

# Bump up the branching factor for the statement indices on the default kb.
com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=1024

# Uncomment to enable collection of OS level performance counters.  When
# collected they will be self-reported through the /counters servlet and
# the workbench "Performance" tab.
#
# com.bigdata.journal.Journal.collectPlatformStatistics=true

Using these configurations, the server starts up fine, and I can access the Web console, make queries, and interact via the BigdataGraphClient API in Java. Now I'm just trying to figure out how to clear out the graph to avoid data leakage between unit tests. I've tried the following:

  1. Use the BigdataGraphClient Java API to remove all edges and vertices. Leaves some of these edges and vertices in place, for reasons unknown to me. graph.getEdges.forEach(Edge::remove) graph.getVertices.forEach(Vertex::remove)

  2. Stop and destroy server. Leaves the journal file in place.

    sparqlServer.stop(); sparqlServer.destroy();

  3. Use a temporary journal file by setting com.bigdata.journal.AbstractJournal.createTempFile=true and commenting out com.bigdata.journal.AbstractJournal.file=bigdata.jnl. This clears the journal file, but it throws a DatasetNotFoundException after the first test.

  4. Put the journal file in a temporary directory in /tmp/bigdata-test/bigdata.jnl and delete/recreate that directory between tests. This has the same problem as #2.

  5. Tried to create my own Journal object and pass that in as the IndexManager parameter of the NanoSparqlServer.newInstance method. This fails due to a known issue with old Lucene dependencies. I cannot include these in my project because I am relying on the newer version of Lucene that conflicts with this one. The error thrown is the same as documented in the referenced Jira ticket.

Anybody know of a clean, reliable way to clear the graph between tests (in a tearDown method run after every test)?

Mack
  • 2,614
  • 2
  • 21
  • 33

1 Answers1

0

Turns out I was running into another issue that made me think my first approach wasn't working. That approach works just fine. I'm leaving the question up in case someone else is wondering how to do this. I'm also open to cleaner/faster ways. If the tests insert a lot of data, iterating through all of the triples/quads and deleting them one-by-one can be slow. I'd prefer something like unlinking the files in the Journal.

Mack
  • 2,614
  • 2
  • 21
  • 33