16

I'd like to develop a route that polls a directory containing CSV files, and for every file it unmarshals each row using Bindy and queues it in activemq.

The problem is files can be pretty large (a million rows) so I'd prefer to queue one row at a time, but what I'm getting is all the rows in a java.util.ArrayList at the end of Bindy which causes memory problems.

So far I have a little test and unmarshaling is working so Bindy configuration using annotations is ok.

Here is the route:

from("file://data/inbox?noop=true&maxMessagesPerPoll=1&delay=5000")
  .unmarshal()
  .bindy(BindyType.Csv, "com.ess.myapp.core")           
  .to("jms:rawTraffic");

Environment is: Eclipse Indigo, Maven 3.0.3, Camel 2.8.0

Thank you

Taka
  • 659
  • 2
  • 10
  • 17

3 Answers3

34

If you use the Splitter EIP then you can use streaming mode which means Camel will process the file on a row by row basis.

from("file://data/inbox?noop=true&maxMessagesPerPoll=1&delay=5000")
  .split(body().tokenize("\n")).streaming()
    .unmarshal().bindy(BindyType.Csv, "com.ess.myapp.core")           
    .to("jms:rawTraffic");
Claus Ibsen
  • 56,060
  • 7
  • 50
  • 65
  • Thanks Claus for your answer. Now I'm facing a different problem. Following on my little exercise I'm trying to extract from the queue and write to a file with `.convertBodyTo(String.class).to("file:data/outbox?fileExist=Append")` but only the first row gets written. All the same, if I use the file option Override I get only the last row. Is there a way to have all the rows from the CSV file written to the file?. Thank you – Taka Nov 15 '11 at 15:46
  • You need to specify a file name, .to("file:data/outbox?fileName=data.csv&fileExist=Append") – Claus Ibsen Nov 16 '11 at 10:34
  • Add `.thread()` after `.streaming()` could it be more efficient ? – Pith Oct 24 '12 at 07:28
  • The splitter EIP has built-in support for multi-threading, you can refer to an executorService. This would be more ideal to use that, than threads. But the latter is also possible. See the Camel docs for examples. – Claus Ibsen Oct 29 '12 at 07:10
  • I see tokenize by new line. What about multiline rows? Are the supported? – Daniil Iaitskov Dec 02 '13 at 21:34
  • Yeah read the documentation - there is a section about grouping N lines together - http://camel.apache.org/splitter – Claus Ibsen Dec 02 '13 at 21:35
  • You can also use BeanIO in case if you want to group records and process using the camel splitter http://camel.apache.org/beanio.html – Sagar Nov 11 '16 at 08:11
3

For the record and for other users which might have searched for this as much as me, meanwhile there seems to be an easier method which also works well with useMaps:

CsvDataFormat csv = new CsvDataFormat()
    .setLazyLoad(true)
    .setUseMaps(true);

from("file://data/inbox?noop=true&maxMessagesPerPoll=1&delay=5000")
    .unmarshal(csv)
    .split(body()).streaming()
    .to("log:mappedRow?multiline=true");
C12Z
  • 148
  • 1
  • 7
1

Using both Splitter and Aggregator EIPs would be the best strategy for processing large CSV files in Apache Camel. Read more about it form Composed Message Processor

Here is an example using Java DSL:

package com.camel;

import org.apache.camel.CamelContext;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.dataformat.csv.CsvDataFormat;
import org.apache.camel.impl.DefaultCamelContext;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.QuoteMode;

public class FileSplitter {

    public static void main(String args[]) throws Exception {
        CamelContext context = new DefaultCamelContext();
        CsvDataFormat csvParser = new CsvDataFormat(CSVFormat.DEFAULT);
        csvParser.setSkipHeaderRecord(true);
        csvParser.setQuoteMode(QuoteMode.ALL);
        context.addRoutes(new RouteBuilder() {
            public void configure() {
                String fileName = "Hello.csv";
                int lineCount = 20;
                System.out.println("fileName = " + fileName);
                System.out.println("lineCount = " + lineCount);
                from("file:data/inbox?noop=true&fileName=" + fileName).unmarshal(csvParser).split(body()).streaming()
                        .aggregate(constant(true), new ArrayListAggregationStrategy()).completionSize(lineCount)
                        .completionTimeout(1500).marshal(csvParser)
                        .to("file:data/outbox?fileName=${file:name.noext}_${header.CamelSplitIndex}.csv");
            }
        });
        context.start();
        Thread.sleep(10000);
        context.stop();
        System.out.println("End");
    }
}
Amitabha Roy
  • 769
  • 5
  • 8