I have a Spark job written in Scala that ultimately writes out to AWS DynamoDB. I want to write some unit tests around it, but the only problem is I don't have a clue how to go about mocking the bit that writes to DynamoDB. I'm making use of their emr-dynamodb-connector class, which means I'm not using any dependency injection (otherwise this would be easy).
After I read in some RDD data using Spark, I do some simple transforms on it into a Pair RDD of type (org.apache.hadoop.io.Text, org.apache.hadoop.dynamodb.DynamoDBItemWritable)
. So my code's only brush-up with Dynamo is by creating DynamoDBItemWritable
objects. That class doesn't inherently contain any logic to utilize the AWS SDK to save anything; it's essentially just a data object. My code then calls this:
val conf = new Configuration()
conf.set("dynamodb.servicename", "dynamodb")
conf.set("dynamodb.input.tableName", "MyOutputTable")
conf.set("dynamodb.output.tableName", "MyInputTable")
conf.set("dynamodb.endpoint", "https://dynamodb.us-east-1.amazonaws.com")
conf.set("dynamodb.regionid", "us-east-1")
conf.set("mapred.output.format.class", "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
conf.set("mapred.input.format.class", "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
myTransformedRdd.saveAsHadoopDataset(new JobConf(conf)
...and the connector magically registers the right classes and makes the right calls so that it effectively saves the results to DynamoDB accordingly.
I can't mock SparkSession
because it has a private constructor (that would be extremely messy anyway). And I don't have any direct way, as far as I know, to mock the DynamoDB client. Is there some magic syntax in Scala (or Scalatest, or Scalamock) to allow me to tell it that if it ever wants to instantiate a Dynamo client class, that it should use a mocked version instead?
If not, how would I go about testing this code? I suppose theoretically, perhaps there's a way to set up a local, in-memory instance of Dynamo and then change the value of dynamodb.endpoint
but that sounds horribly messy just to get a unit test working. Plus I'm not sure it's possible anyway.