19

What is the best approach to write unit tests for code that persists data to nosql data store, in our case cassandra?

=> We are using embedded server approach using a utility from git hub (https://github.com/hector-client/hector/blob/master/test/src/main/java/me/prettyprint/hector/testutils/EmbeddedServerHelper.java). However I have been seeing some issues with this. 1) It persists data across multiple test cases making it hard for us to make sure data is different in test cases of a test class. I tried calling cleanUp @After each test case, but that doesn't seem to cleanup data. 2) We are running out of memory as we add more tests and this could be because of 1, but I am not sure yet on that. I currently have 1G heap size to run my build.

=> The other approach I have been thinking is to mock the cassandra storage. But that might leak some issues in the cassandra schema as we often found the above approach catching issues with the way data is stored into cassandra.

Please let me know you thoughts on this and if anyone has used EmbeddedServerHelper and are familiar with the issues I have mentioned.


Just an update. I was able to resolve 2) running out of java heap space issue when running builds by changing the in_memory_compaction_limit_in_mb parameter to 32 in the cassandra.yaml used by the test embedded server. The below link helped me http://www.datastax.com/docs/0.7/configuration/storage_configuration#in-memory-compaction-limit-in-mb. It was 64 and started to fail consistently during compaction.

Marc Carré
  • 1,446
  • 13
  • 19
bobbypavan
  • 548
  • 1
  • 4
  • 13
  • I'll be very interested to hear your experience with the recent in-memory feature for testing: http://www.datastax.com/2014/02/why-we-added-in-memory-to-cassandra – Arielr Jun 03 '14 at 11:43

6 Answers6

10

We use an embedded cassandra server, and I think that is the best approach when testing cassandra, mocking the cassandra API is too error prone.

EmbeddedServerHelper.cleanup() just removes files rom the file system, but data may still exist in memory.

There is a teardown() method in EmbeddedServerHelper, but I a not sure how effective that is, as cassandra has a lot of static singletons whose state is not cleaned up by teardown()

What we do is we have a method that calls truncate on each column family between tests. That will remove all data.

mkobit
  • 43,979
  • 12
  • 156
  • 150
sbridges
  • 24,960
  • 4
  • 64
  • 71
7

I think you can take a look at cassandra-unit : https://github.com/jsevellec/cassandra-unit/wiki

jeremy
  • 71
  • 1
  • 1
  • 7
    Please disclose your affiliation in your answer. See the [FAQ](http://stackoverflow.com/faq#promotion) for the policy on this. – Bill the Lizard Oct 13 '11 at 20:58
  • Perfect - thanks for writing and sharing this. Cassandra-unit is working well so far. – Martin Dow Apr 23 '12 at 11:42
  • 1
    Take into account that cassandra-unit is licenced under [GPLv3](https://github.com/jsevellec/cassandra-unit/blob/master/LICENSE.txt). – Marcin Feb 27 '13 at 13:46
  • @Marcin, FYI I asked the developer to [confirm](https://github.com/jsevellec/cassandra-unit/issues/76) the license (there was some inconsistency across various project files), and they confirmed it is LGPLv3. – Ben Alex Feb 11 '14 at 21:45
  • 1
    what is the equivalent for .net? – Gomes Apr 24 '15 at 03:58
3

I use the Mojo Cassandra maven plugin.

Here's an example plugin configuration that I use to spin up a Cassandra server for use by my unit tests:

 <build>
    <plugins>
        <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>cassandra-maven-plugin</artifactId>
            <version>1.1.0-1</version>
            <executions>
                <execution>
                    <goals>
                        <goal>start</goal>
                        <goal>flush</goal>
                        <goal>cleanup</goal>
                    </goals>
                    <phase>compile</phase>
                </execution>
            </executions>
        </plugin>
     <plugins>
  <build>

I did manage to get Hector's embedded server helper class working which can be very useful, however I ran into classloader conflicts due to this bug.

Hasson
  • 1,894
  • 1
  • 21
  • 25
2

You cannot restart Cassandra instance within one VM - Cassandra has "shutdown per kill policy" due to singeltons that they are using.

You also do not need to restart Casandra, just only remove all column families (CFs). In order to remove CF you need first to flush data, compact it and after that finally you can drop it.

This code will connect to embedded Cassandra and execute required cleaup:

private void cleanAndCompact() throws Exception {
    MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
    ObjectName ssn = new ObjectName("org.apache.cassandra.db:type=StorageService");
    StorageServiceMBean ssmb = JMX.newMBeanProxy(mbs, ssn, StorageServiceMBean.class);

    List<String> keyspaces = ssmb.getKeyspaces();
    if (keyspaces == null) {
        LOG.info("No keysaces to cleanup");
        return;
    }

    for (String keyspace : keyspaces) {
        if (keyspace.equalsIgnoreCase("system")) {
            continue;
        }
        execCleanup(ssmb, keyspace);
    }

}

private void execCleanup(StorageServiceMBean ssmb, String keyspace) throws Exception {
    LOG.info("Cleaning up keyspace: " + keyspace);

    ssmb.invalidateKeyCaches(keyspace, new String[0]);
    ssmb.invalidateRowCaches(keyspace, new String[0]);
    ssmb.forceTableFlush(keyspace, new String[0]);
    ssmb.forceTableCompaction(keyspace, new String[0]);
    ssmb.forceTableCleanup(keyspace, new String[0]);
}

Now execute CLI drop CF script:

CliMain.main(new String[] { "-host", host, "-port", Integer.toString(rpcPort), "-f", "/my/script/path/script.txt","-username", "myUser", "-password", "123456" });

and script.txt could have:

use ExampleTestSpace;
drop column family ExampleCF;
Jiri Kremser
  • 12,471
  • 7
  • 45
  • 72
Maciej Miklas
  • 3,305
  • 4
  • 27
  • 52
0

In addition to what's been posted, there are cases when you want to test error handling - how does your app behave when a Cassandra query fails.

There are a few libraries that can help you with this:

I'm the author of cassandra-spy and wrote to it help me test these cases.

Andrejs
  • 26,885
  • 12
  • 107
  • 96
0

By "doesn't seem to clean up data" what exactly do you mean? That you still see your data in the database?

That problem might be due to Cassandra that doesn't delete the "values" instantly, but only after the gc_grace_seconds seconds are passed (that usually defaults to 10 days). Cassandra marks the values to be deleted.

Milo Casagrande
  • 135
  • 1
  • 3
  • 11
  • I misinterpreted cleanup to delete the data that was created in the test cases. But cleanup is only meant to do some housekeeping and remove all the commit logs and data directories created by the embedded cassandra process. – bobbypavan Jul 08 '11 at 17:43
  • without cleanup you will be not able to drop your CF - drop request will simply do nothing and create request will throw exception that CF already exists – Maciej Miklas Dec 01 '11 at 12:35