1

I have a Spark job written in Scala that ultimately writes out to AWS DynamoDB. I want to write some unit tests around it, but the only problem is I don't have a clue how to go about mocking the bit that writes to DynamoDB. I'm making use of their emr-dynamodb-connector class, which means I'm not using any dependency injection (otherwise this would be easy).

After I read in some RDD data using Spark, I do some simple transforms on it into a Pair RDD of type (org.apache.hadoop.io.Text, org.apache.hadoop.dynamodb.DynamoDBItemWritable). So my code's only brush-up with Dynamo is by creating DynamoDBItemWritable objects. That class doesn't inherently contain any logic to utilize the AWS SDK to save anything; it's essentially just a data object. My code then calls this:

val conf = new Configuration()
conf.set("dynamodb.servicename", "dynamodb")
conf.set("dynamodb.input.tableName", "MyOutputTable")
conf.set("dynamodb.output.tableName", "MyInputTable")
conf.set("dynamodb.endpoint", "https://dynamodb.us-east-1.amazonaws.com")
conf.set("dynamodb.regionid", "us-east-1")
conf.set("mapred.output.format.class", "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
conf.set("mapred.input.format.class", "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
myTransformedRdd.saveAsHadoopDataset(new JobConf(conf)

...and the connector magically registers the right classes and makes the right calls so that it effectively saves the results to DynamoDB accordingly.

I can't mock SparkSession because it has a private constructor (that would be extremely messy anyway). And I don't have any direct way, as far as I know, to mock the DynamoDB client. Is there some magic syntax in Scala (or Scalatest, or Scalamock) to allow me to tell it that if it ever wants to instantiate a Dynamo client class, that it should use a mocked version instead?

If not, how would I go about testing this code? I suppose theoretically, perhaps there's a way to set up a local, in-memory instance of Dynamo and then change the value of dynamodb.endpoint but that sounds horribly messy just to get a unit test working. Plus I'm not sure it's possible anyway.

madhead
  • 31,729
  • 16
  • 153
  • 201
soapergem
  • 9,263
  • 18
  • 96
  • 152
  • [Local in-memory instance](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html) is available. As you said, it is quite a messy approach. I've used it in project and even with rather simple data model, the text fixtures can get quite complicated and error prone. – kaskelotti Jun 26 '19 at 05:48

1 Answers1

0

Take a look at LocalStack. It provides an easy-to-use test/mocking framework for developing AWS-related applications by spinning up the AWS-compatible APIs on your local machine or in Docker. It supports two dozen of AWS APIs and DynamoDB is among them. It is really a great tool for functional testing without using a separate environment in AWS for that.

If you need only DynamoDB there is another tool: DynamoDB Local, a Docker image with Amazon DynamoDB onboard.

Both are as simple as starting a Docker container:

docker run -p 8000:8000 amazon/dynamodb-local
docker run -P localstack/localstack

And if you're using JUnit 5 for the tests, let me recommend you JUnit 5 extensions for AWS, a few JUnit 5 extensions that could be useful for testing AWS-related code. These extensions can be used to inject clients for AWS service clients provided by tools like localstack (or the real ones). Both AWS Java SDK v 2.x and v 1.x are supported.

madhead
  • 31,729
  • 16
  • 153
  • 201