I have a Camel route that reads a file from S3 and the processes the input file as follows:
- Parse each row into a POJO (Student) using Bindy
- Split the output by body()
- Aggregate by an attribute of the the body (
.semester
) and a batch size of 2 - Invoke the persistence service to upload to DB in given batches
The problem is that with a batch size of 2 and an odd number of records, there is always one record that does not get saved.
Code provided is Kotlin but should not be very different from equivalent Java code (bar the slash in front of "\${simple expression}" or the lack of semicolons to terminate statements.
If I set the batch size to 1 then every record is saved, otherwise the last record never gets saved.
I have checked the documentation for message-processor a few times but it doesn't seem to cover this particular scenario.
I have also set [completionTimeout
|completionInterval
] in addition to completionSize
but it does not make any difference.
Has anyone encountered this problem before?
val csvDataFormat = BindyCsvDataFormat(Student::class.java)
from("aws-s3://$student-12-bucket?amazonS3Client=#amazonS3&delay=5000")
.log("A new Student input file has been received in S3: '\${header.CamelAwsS3BucketName}/\${header.CamelAwsS3Key}'")
.to("direct:move-input-s3-object-to-in-progress")
.to("direct:process-s3-file")
.to("direct:move-input-s3-object-to-completed")
.end()
from("direct:process-s3-file")
.unmarshal(csvDataFormat)
.split(body())
.streaming()
.parallelProcessing()
.aggregate(simple("\${body.semester}"), GroupedBodyAggregationStrategy())
.completionSize(2)
.bean(persistenceService)
.end()
With an input CSV file including seven (7) records, this is the output generated (with some added debug logging):
WARN 19540 --- [student-12-move] c.a.s.s.internal.S3AbortableInputStream : Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use. INFO 19540 --- [student-12-move] student-workflow-main : A new Student input file has been received in S3: 'student-12-bucket/inbox/foo.csv' INFO 19540 --- [student-12-move] move-input-s3-object-to-in-progress : Moving S3 file 'inbox/foo.csv' to 'in-progress' folder... INFO 19540 --- [student-12-move] student-workflow-main : Moved input S3 file 'in-progress/foo.csv' to 'in-progress' folder... INFO 19540 --- [student-12-move] pre-process-s3-file-records : Start saving to database... DEBUG 19540 --- [read #7 - Split] c.b.i.d.s.StudentPersistenceServiceImpl : Saving record to database: Student(id=7, name=Student 7, semester=2nd, javaMarks=25) DEBUG 19540 --- [read #7 - Split] c.b.i.d.s.StudentPersistenceServiceImpl : Saving record to database: Student(id=5, name=Student 5, semester=2nd, javaMarks=81) DEBUG 19540 --- [read #3 - Split] c.b.i.d.s.StudentPersistenceServiceImpl : Saving record to database: Student(id=6, name=Student 6, semester=1st, javaMarks=15) DEBUG 19540 --- [read #3 - Split] c.b.i.d.s.StudentPersistenceServiceImpl : Saving record to database: Student(id=2, name=Student 2, semester=1st, javaMarks=62) DEBUG 19540 --- [read #2 - Split] c.b.i.d.s.StudentPersistenceServiceImpl : Saving record to database: Student(id=3, name=Student 3, semester=2nd, javaMarks=72) DEBUG 19540 --- [read #2 - Split] c.b.i.d.s.StudentPersistenceServiceImpl : Saving record to database: Student(id=1, name=Student 1, semester=2nd, javaMarks=87) INFO 19540 --- [student-12-move] device-group-workflow-main : End pre-processing S3 CSV file records... INFO 19540 --- [student-12-move] move-input-s3-object-to-completed : Moving S3 file 'in-progress/foo.csv' to 'completed' folder... INFO 19540 --- [student-12-move] device-group-workflow-main : Moved S3 file 'in-progress/foo.csv' to 'completed' folder...