I would like to know what would be the best approach to test the below scenario in a Spring Batch job:
- A job consisting of two steps:
1) The first step reads from a database using an ItemReader
(from apache kudu using impala
) and writes into a file the content generated by the query.
- That
itemReader
has arowMapper
which creates a complex object from the resultset. ItsitemWriter
just makes atoString (which in fact is a JSON representation)
of that complex object.
2) The second step reads from the file generated by the step 1
and processes it. After processing all file, everything is written into a new file.
- The
itemReader
reads the file fromstep 1
using ajsonLineMapper
, then processes the new complex objects generated from mapper and writes them to a new file.
Then the job's listener uploads into S3 both files.
I need this workflow because the first step generates the sample needed for the second step. And if someday I need to test only the second step I can use an old sample from the first step as database varies along the time and without it, I maybe could not generate the same sample of the execution of two days before.
The first step is the hardest one to test, but I would like to test both steps in a way like the following:
1) From step 1
I need to check that the query syntax is correct. Also, check that from database resultset it generates correct objects via the rowMapper
. The content of the file of itemWriter
is correct (correct means that is expected)
.
2) That second step is easier to test, as I could start with a predefined file. It should test that reading from the file using the jsonLineMapper
is done correctly. The processing part is tested apart, but I could follow one simple workflow, and the final file has the expected content.
My idea for testing that scenario was:
1) In order to check that the query syntax is correct, I need a query builder (I googled and I found libraries like jOOQ
but I don't want to add an external library just for building a string query). After checking that the query is correct, maybe I should mock
the database and return a predefined complex object and write it into the file. The problem is that if the query is returning a missing column, the object would not be correct, and the test should fail, so if I return a predefined object I would never know which is the query return.
As you can see here the problem radicates in to validate the query, as if the query is correct, I can test the rowMapper
and the final file.
2) For this step, I thought that the best approach would be to have a predefined file with a correct content from step 1
and just check that the final file content is what I expect. I think it is easy to test that step.
Any better way or approach for testing this scenario?
Thanks!