I am using the Mule 4.4 community edition on premise. Thanks to help, I have been able to read a large file without consuming memory and processing it, which is all good (here).
Now building on this further - my use case is to read all .csv files from within a directory. And then process them one by one:
\opt\out\
students.csv
teachers.csv
collesges.csv
....
So my plan was to list the files in the directory:
<sftp:list doc:name="List" config-ref="SFTP_Config" directoryPath="/opt/out">
<non-repeatable-iterable />
<sftp:matcher filenamePattern="#['*.csv' ]"
directories="EXCLUDE" symLinks="EXCLUDE" />
</sftp:list>
And then I wanted to only read file names from directory and not read payload.
As per this early access article we are advised to use <non-repeatable-iterable />
. However, after the list file operation as per article when I try to extract attributes:
<set-payload doc:name="Set Payload" value="#[output application/json --- payload map $.attributes]"/>
No attributes are available... (my plan is to extract the file names and then run a for loop for each file name and then a choice condition to determine if file name has student, use student transformer, if teacher use teacher transformer, etc.)
However, as attributes are not available, I am not able to pass file names to the for loop (yet to be written).
So I changed from <non-repeatable-iterable />
to <repeatable-in-memory-iterable />
Code below:
<sftp:list doc:name="List" config-ref="SFTP_Config" directoryPath="/opt/out">
<repeatable-in-memory-iterable />
<sftp:matcher filenamePattern="#['*.csv' ]"
directories="EXCLUDE" symLinks="EXCLUDE" />
</sftp:list>
Using the above, I can extract the attributes of file names.
I am confused about the following:
- The files to be processed in the above directory will be large (each file 700 MB), so while iterating the directory by using
repeatable-in-memory-iterable
, will it cause any memory issues? (I do not want to read file content, simply get file names at this stage)
Here is the complete payload till now (note - it does not contain any for loop to iterate over files, which I will plug in...)
<flow name="employee-process-flow">
<http:listener doc:name="Listener" config-ref="HTTP_Listener_config" path="/processFiles"/>
<set-variable value='#[now() as String { format: "ddMMuu" }]' doc:name="Set todays date as ddmmyy" doc:id="c6a91a41-65b1-46df-a720-9c13fe360b6b" variableName="today"/>
<sftp:list doc:name="List" config-ref="SFTP_Config" directoryPath="/opt/out">
<repeatable-in-memory-iterable />
<sftp:matcher filenamePattern="#['*.csv' ]"
directories="EXCLUDE" symLinks="EXCLUDE" />
</sftp:list>
<set-payload doc:name="Set Payload" value="#[output application/json --- payload map $.attributes]"/>
<foreach doc:name="For Each" >
<logger level="INFO" doc:name="Logger" message="we are here"/>
</foreach>
</flow>