1

We need to send events to kinesis and due to aws pricing we are planning to put records on to kinesis in batches.

We read in a csv files and then use the file splitter to spit out lines and transform each line into json.

So after transformation to json how can we batch these lines into say 25 lines per batch so that our kinesis serviceActivator can send the batch?

Any example would be appreciated.

    <int-file:splitter id="fileLineSplitter"
                       input-channel="fileInputChannel"
                       output-channel="splitterOutputChannel"
                       markers="true" />


<int:transformer id="csvToDataCdrTransformer"
                     ref="dataCdrLineTransformer"
                     method="transform"
                     input-channel="lineOutputChannel"
                     output-channel="dataCdrObjectInputChannel">
    </int:transformer>


    <int:object-to-json-transformer input-channel="dataCdrObjectInputChannel"
                                    output-channel="kinesisSendChannel">
        <int:poller fixed-delay="50"/>
    </int:object-to-json-transformer>

EDIT: I added as "Artem Bilan" suggested and it worked

<int:aggregator input-channel="aggregateChannel"
                output-channel="toJsonChannel"
                release-strategy-expression="#this.size() eq 2"
                expire-groups-upon-completion="true"/>

BUT I get error:

  1. am using markers="true" so that we know its the end of the file and so we can rename it to say ".done".

  2. added a router between the splitter and the transformer which simply routes to "nullChannel" or "fileProcessedChannel" when FileMarker is END, otherwise, the split line goes onto default-output-channel="lineOutputChannel"

    <int:router ref="fileMarkerCustomRouter" inputchannel="splitterOutputChannel" default-output-channel="lineOutputChannel"/>
    

and the router code looks like this

 @Override
    protected Collection<MessageChannel> determineTargetChannels(Message<?> message) {
        Collection<MessageChannel> targetChannels = new ArrayList<MessageChannel>();

        if (isPayloadTypeFileMarker(message)) {

            FileSplitter.FileMarker payload = (FileSplitter.FileMarker) message.getPayload();

            if (isStartOfFile(payload)) {

                targetChannels.add(nullChannel);

            } else if (isEndOfFile(payload)) {

                targetChannels.add(fileProcessedChannel);
            }
        }
        return targetChannels;
    }

but am getting this error:

Caused by: java.lang.IllegalStateException: Null correlation not allowed.  Maybe the CorrelationStrategy is failing?

Any ideas?

user2279337
  • 691
  • 5
  • 13
  • 26

1 Answers1

1

For this purpose you definitely need an <aggregator> with the release-strategy-expression="25" and expire-groups-upon-completion="true" to let it to form a fresh group for the same correlationKey after releasing one.

Not, sure why you need markers="true", but without that the <int-file:splitter> populates appropriate correlation headers. So, you may even consider to rely just on the default splitting and default aggregation afterwards.

In addition you should consider to convert to the JSON the result from an aggregator. It emits a List<?>. Serializing the whole list into the JSON much efficient. Plus you might will need one more conversion otherwise before sending to Kinesis.

Therefore a prototype for your config should be like this:

<int-file:splitter id="fileLineSplitter"
                   input-channel="fileInputChannel"
                   output-channel="splitterOutputChannel"/>

<int:transformer id="csvToDataCdrTransformer"
                 ref="dataCdrLineTransformer"
                 method="transform"
                 input-channel="lineOutputChannel"
                 output-channel="aggregateChannel">
</int:transformer>

<int:aggregator input-channel="aggregateChannel" 
                output-channel="toJsonChannel"
                expire-groups-upon-completion="true" />

<int:object-to-json-transformer input-channel="toJsonChannel"
                                output-channel="kinesisSendChannel"/>

This way the whole file will be treated as a batch. You split it, process each line, aggregated them back to list and then convert to JSON before sending to Kinesis.

From here I would like to ask you to raise a JIRA to add ObjectToJsonTransformer.ResultType.BYTES mode for better efficient with the downstream components who is based on the byte[] like KinesisMessageHandler.

Artem Bilan
  • 113,505
  • 11
  • 91
  • 118
  • am using markers="true" so that we know its the end of the file and so we can rename it to say ".done". – user2279337 Apr 19 '18 at 17:33
  • I applied your solution and it does work. However, I had to place a router in between splitter and transformer and am getting the error could you please help me with the "EDIT" I made to the question. – user2279337 Apr 19 '18 at 17:41
  • 1
    If your `transform` method returns `Message>`, you are responsible for copying input headers to the output message. If you return just the new payload, the framework will copy the headers - the splitter adds a `correlationId` header by default. – Gary Russell Apr 19 '18 at 18:26
  • I added a router, please see router code in EDIT section 2 above, and getting the "Null correlation not allowed" error. The method determineTargetChannels(...) above receives a Message> with correlationId from Splitter and as the router returns a channel how do I pass the correlationId back as part of the returned channel? I am using the router so that I know when its the END of the file, hence markers=true, so that I can rename the file. Would you suggest another way of renaming a file? Also if I remove the router and go from Splitter to Trnasformer then it all works as @Artem suggested. – user2279337 Apr 19 '18 at 19:24
  • With `markers = true` you have to turn on `correlation` on the `FileSplitter` manually: `apply-sequence="true"` – Artem Bilan Apr 19 '18 at 19:52