I am using Apache Camel to generate monthly reports. I have a MySQL query which when ran against my DB generates around 5 million records (20 columns each). The query itself takes approximately 70 minutes to execute.
To speed up the process, I created 5 seda (worker) routes and used multicast().parallelProcessing()
which query the DB in parallel for different time ranges, and then merged the result using an aggregator.
Now, I can see 5 million records in my exchange body (in the form of List<HashMap<String, Object>>
). When I try to format this using a Camel Bindy to generate a csv file out of this data, I am getting a GC Overhead Exception. I tried increasing Java Heap Size, but it takes forever to transform.
Is there any other method, to convert this raw data into a well formatted csv file? Can Java 8 streams be useful?
Code
from("direct://logs/testLogs")
.routeId("Test_Logs_Route")
.setProperty("Report", simple("TestLogs-${date:now:yyyyMMddHHmm}"))
.bean(Logs.class, "buildLogsQuery") // bean that generates the logs query
.multicast()
.parallelProcessing()
.to("seda:worker1?waitForTaskToComplete=Always&timeout=0", // worker routes
"seda:worker2?waitForTaskToComplete=Always&timeout=0",
"seda:worker3?waitForTaskToComplete=Always&timeout=0",
"seda:worker4?waitForTaskToComplete=Always&timeout=0",
"seda:worker5?waitForTaskToComplete=Always&timeout=0");
All my worker routes look like this
from("seda:worker4?waitForTaskToComplete=Always")
.routeId("ParallelProcessingWorker4")
.log(LoggingLevel.INFO, "Parallel Processing Worker 4 Flow Started")
.setHeader("WorkerId", constant(4))
.bean(Logs.class, "testBean") // appends time-clause to the query based in WorkerID
.to("jdbc:oss-ro-ds")
.to("seda:resultAggregator?waitForTaskToComplete=Always&timeout=0");
Aggregator
from("seda:resultAggregator?waitForTaskToComplete=Always&timeout=0")
.routeId("Aggregator_ParallelProcessing")
.log(LoggingLevel.INFO, "Aggregation triggered for processor ${header.WorkerId}")
.aggregate(header("Report"), new ParallelProcessingAggregationStrategy())
.completionSize(5)
.to("direct://logs/processResultSet")
from("direct://logs/processResultSet")
.routeId("Process_Result_Set")
.bean(Test.class, "buildLogReport");
.marshal(myLogBindy)
.to("direct://deliver/ooma");
Method buildLogReport
public void buildLogReport(List<HashMap<String, Object>> resultEntries, Exchange exchange) throws Exception {
Map<String, Object> headerMap = exchange.getIn().getHeaders();
ArrayList<MyLogEntry> reportList = new ArrayList<>();
while(resultEntries != null){
HashMap<String, Object> resultEntry = resultEntries.get(0);
MyLogEntry logEntry = new MyLogEntry();
logEntry.setA((String) resultEntry.get("A"));
logEntry.setB((String) resultEntry.get("B"));
logEntry.setC(((BigDecimal) resultEntry.get("C")).toString());
if (null != resultEntry.get("D"))
logEntry.setD(((BigInteger) resultEntry.get("D")).toString());
logEntry.setE((String) resultEntry.get("E"));
logEntry.setF((String) resultEntry.get("F"));
logEntry.setG(((BigDecimal) resultEntry.get("G")).toString());
logEntry.setH((String) resultEntry.get("H"));
logEntry.setI(((Long) resultEntry.get("I")).toString());
logEntry.setJ((String) resultEntry.get("J"));
logEntry.setK(TimeUtils.convertDBToTZ((Date) resultEntry.get("K"), (String) headerMap.get("TZ")));
logEntry.setL(((BigDecimal) resultEntry.get("L")).toString());
logEntry.setM((String) resultEntry.get("M"));
logEntry.setN((String) resultEntry.get("State"));
logEntry.setO((String) resultEntry.get("Zip"));
logEntry.setP("\"" + (String) resultEntry.get("Type") + "\"");
logEntry.setQ((String) resultEntry.get("Gate"));
reportList.add(logEntry);
resultEntries.remove(resultEntry);
}
// Transform The Exchange Message
exchange.getIn().setBody(reportList);
}