I need to collect and process sets of files generated by another organization. For simplicity, say that the set consists of two files, a summary file and a detail file named like: SUM20150701.dat and DTL20150701.dat, which would constitute a set for date 20150701. The issue is, sets need to be processed in order, and the transmission of files from an outside organization can be error prone such that a file may be missing. If this occurs, this set of files should hold, as should any following sets that are found. As example, at the start of the mule process, the source folder may have in it: SUM20150701.dat, SUM20150703.dat, DTL20150703.dat. That is, the data set for 20150701 is incomplete while 20150703 is complete. I need to have both data sets hold until DTL20150701.dat arrives, then process them in order.
In this simplified form of my mule process a source folder is watched for files. When found, they are moved to an archive folder and passed to the collection-aggregator using the date as the sequence and correlation values. When a set is complete, it is moved to a destination folder. A lengthy timeout is used on the collector to make sure incomplete sets are not processed:
<file:connector name="File" autoDelete="false" streaming="false" validateConnections="true" doc:name="File">
<file:expression-filename-parser />
</file:connector>
<file:connector name="File1" autoDelete="false" outputAppend="true" streaming="false" validateConnections="true" doc:name="File" />
<vm:connector name="VM" validateConnections="true" doc:name="VM">
<receiver-threading-profile maxThreadsActive="1"></receiver-threading-profile>
</vm:connector>
<flow name="fileaggreFlow2" doc:name="fileaggreFlow2">
<file:inbound-endpoint path="G:\SourceDir" moveToDirectory="g:\SourceDir\Archive" connector-ref="File1" doc:name="get-working-files"
responseTimeout="10000" pollingFrequency="5000" fileAge="600000" >
<file:filename-regex-filter pattern="DTL(.*).dat|SUM(.*).dat" caseSensitive="false"/>
</file:inbound-endpoint>
<message-properties-transformer overwrite="true" doc:name="Message Properties">
<add-message-property key="MULE_CORRELATION_ID" value="#[message.inboundProperties.originalFilename.substring(5, message.inboundProperties.originalFilename.lastIndexOf('.'))]"/>
<add-message-property key="MULE_CORRELATION_GROUP_SIZE" value="2"/>
<add-message-property key="MULE_CORRELATION_SEQUENCE" value="#[message.inboundProperties.originalFilename.substring(5, message.inboundProperties.originalFilename.lastIndexOf('.'))]"/>
</message-properties-transformer>
<vm:outbound-endpoint exchange-pattern="one-way" path="Merge" doc:name="VM" connector-ref="VM"/>
</flow>
<flow name="fileaggreFlow1" doc:name="fileaggreFlow1" processingStrategy="synchronous">
<vm:inbound-endpoint exchange-pattern="one-way" path="Merge" doc:name="VM" connector-ref="VM"/>
<processor-chain doc:name="Processor Chain">
<collection-aggregator timeout="1000000" failOnTimeout="true" doc:name="Collection Aggregator"/>
<foreach doc:name="For Each">
<file:outbound-endpoint path="G:\DestDir1" outputPattern="#[function:datestamp:yyyyMMdd.HHmmss].#[message.inboundProperties.originalFilename]" responseTimeout="10000" connector-ref="File1" doc:name="Destination"/>
</foreach>
</processor-chain>
This correctly processes sets found in order if all sets are complete. It correctly waits for incomplete sets to fill, but does not hold following sets, that is in the above example set 20150703 will process while 20150701 is still waiting for the DTL file.
Is there a setting or another construct which will force the collection-aggregator element to wait if there is an earlier collection which is not complete?
I am using the date part of the file name for both the correlation and sequence ID’s which does control that sets process in the order I want if all sets are complete. It is not important if dates do not exist (as with 20150702 in this case), only that existing files are processed in order and that sets must be complete.