0

I want to download and parse a large CSV using camel-csv and I can't figure out a solution that I'm satisfied with. camel-csv seems to be designed to read and process files placed on disk.

I want to download a list of URL's via HTTP and parse the stream as it is downloaded. I can do it by bypassing camel-csv like so:

from("mock:in").process(new TaxiDataProcessor(new DataCSVParserFactory())).to("mock:out");

public class DataProcessor implements Processor {
    private final DataCSVParserFactory csvParserFactory;

    @Inject
    public DataProcessor(DataCSVParserFactory csvParserFactory) {
        this.csvParserFactory = csvParserFactory;
    }

    @Override
    public void process(Exchange exchange) throws Exception {
        String file = (String) exchange.getIn().getBody();
        URL url = new URL(file);
        CSVParser parser = csvParserFactory.build(url);
        for (CSVRecord csvRecord : parser) {
            exchange.getIn().setBody(csvRecord);
        }    
    }
}

But would it be possible to use something like camel-ahc to download the files and pipe that into the csv unmarshalling? Something like:

from("direct:input").unmarshall().csv().to("direct:out");
template.send("ahc:uri");
Martinffx
  • 2,426
  • 4
  • 33
  • 60

1 Answers1

0

Camel-csv is for marshalling and unmarshalling csv. To download a file from some url you need another component like camel-netty4-http.

A simple example:

from("netty4-http:http://localhost:8080/foo")
.marshal().csv()
.log("${body}");

You may need to convert it to String before marshalling.

EDIT:

Ok to download multiple files you need some way to trigger your route. The simplest is a timer but use whatever you prefer. Then you can use toD() which is a dynamic router and inject your url there. If you want to repeat this process you need to split it and then inject. An example below (not tested) to help get started:

//Create the list of urls any way you like. This is just to show the principle. You can create them in a bean and inject them in a Camel header if you like.
String listOfUrls = "url1, url2, url3";

from("timer:foo?period=5000")
.setHeader("urls", constant(listOfUrls))
.split(header("urls")) //split url is part of body now
.toD("${{body}") //take the url from the body and use that as a uri
.log("${body}");

Note, you still need camel-http4 component if you plan to use that to send your requests. http://camel.apache.org/splitter.html See dynamicTo here: http://camel.apache.org/message-endpoint.html

Souciance Eqdam Rashti
  • 3,143
  • 3
  • 15
  • 31
  • yes, that's cool. for one url. In my case I want to process a whole list of url's. how would I do that using your solution? – Martinffx Jan 15 '17 at 13:26