2

I would like to know what would be the best approach to test the below scenario in a Spring Batch job:

  • A job consisting of two steps:

1) The first step reads from a database using an ItemReader (from apache kudu using impala) and writes into a file the content generated by the query.

  • That itemReader has a rowMapper which creates a complex object from the resultset. Its itemWriter just makes a toString (which in fact is a JSON representation) of that complex object.

2) The second step reads from the file generated by the step 1 and processes it. After processing all file, everything is written into a new file.

  • The itemReader reads the file from step 1 using a jsonLineMapper, then processes the new complex objects generated from mapper and writes them to a new file.

Then the job's listener uploads into S3 both files.

I need this workflow because the first step generates the sample needed for the second step. And if someday I need to test only the second step I can use an old sample from the first step as database varies along the time and without it, I maybe could not generate the same sample of the execution of two days before.

The first step is the hardest one to test, but I would like to test both steps in a way like the following:

1) From step 1 I need to check that the query syntax is correct. Also, check that from database resultset it generates correct objects via the rowMapper. The content of the file of itemWriter is correct (correct means that is expected).

2) That second step is easier to test, as I could start with a predefined file. It should test that reading from the file using the jsonLineMapper is done correctly. The processing part is tested apart, but I could follow one simple workflow, and the final file has the expected content.

My idea for testing that scenario was:

1) In order to check that the query syntax is correct, I need a query builder (I googled and I found libraries like jOOQ but I don't want to add an external library just for building a string query). After checking that the query is correct, maybe I should mock the database and return a predefined complex object and write it into the file. The problem is that if the query is returning a missing column, the object would not be correct, and the test should fail, so if I return a predefined object I would never know which is the query return.

As you can see here the problem radicates in to validate the query, as if the query is correct, I can test the rowMapper and the final file.

2) For this step, I thought that the best approach would be to have a predefined file with a correct content from step 1 and just check that the final file content is what I expect. I think it is easy to test that step.

Any better way or approach for testing this scenario?

Thanks!

1 Answers1

2

For step 1, I would recommend using an embedded database to insert some rows, run your job and then assert the generated file is correct. This allows you to have control over test data in order to validate your query and the expected result in the file. You can find an example here: https://docs.spring.io/spring-batch/4.0.x/reference/html/testing.html#endToEndTesting. Spring Batch provides the AssertFile.assertFileEquals to test if two files are equal. This can help you validating the output of step 1 against an expected file.

For step 2, you can create some valid/invalid files (those can be the result of step 1) and use them as input to test step 2. The caveat though is that if the result of step 1 changes, those files will not be valid anymore to test step 2 (so this is maintenance cost that you need to be aware of).

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • Thank you for your answer! The problem is that I cannot embed any database as the syntax for joining tables is not the same for PostgreSQL than for Impala for example. I'm using Impala for querying Apache Kudu, and I don't find any way to embed Impala and Kudu. – Mohamed Said Benmousa Aug 23 '18 at 14:11
  • Indeed, that's an issue when you can't use the real database in embedded mode or when the syntax differs between database vendors. You can still use the same approach to test step 1 with a real database instead of an embedded one. – Mahmoud Ben Hassine Aug 23 '18 at 14:49
  • This is what I was going to do in the first instance, but I wanted to know if there would be an any better way because if the database (the test one) is down the test would fail. But I think I will try your approach, thanks! – Mohamed Said Benmousa Aug 23 '18 at 14:59
  • You can use test containers to start/stop a database container before/after your test. More details here: https://www.testcontainers.org/usage/database_containers.html#junit-rule. This allows you to avoid the scenario "if the test database is down the test would fail". – Mahmoud Ben Hassine Aug 23 '18 at 17:55