1

I am trying to code a Camel batch :

  • process all files inside a specific directory and for each file :
  • validate the XML with a XSD schema
  • unmarshall different parts and process data
  • don't stop on exception
  • if a least one error occurs, move to failed dirrectory at the end, move to done otherwise

The difficulties I encounter :

  • File Component allow to automatically move file on done/failed directory but as soon as you use of split/aggregator, the file is always moved in done and don't even wait the aggregation finished.
  • Managing Exception is not intuitive
  • Split and aggretors are nightmare and there is not enough "real world" examples in documentation
  • XML tokenization on complex structure brings us to something beyond understanding

Of course, I meet these problems because I am new to Apache Camel :)

The idea of what I am trying to do :

  1. File Component (ok)
  2. XSD validation (ok, if error, File component move to failed)
  3. make many splits/multicast to read XML, when a error occurs, we ignore exception, keep a header in error, and continue (not really ok, I can read but don't manage to aggregate correctly)
  4. we aggregate (in fact nothing to aggregate, we just want to check all headers)
  5. if error, we explicitely move to a failed directory

Simplified XML for this example :

<root>
    <library></library>
    <books year="2015">
        <book></book>
        <book></book>
        ... many
    </books>
    <books year="2016">
        <book></book>
        <book></book>
        ... many
    </books>

   ... many years

</root>

How would you construct the batch with such an XML ? Moreover, let's say you have to read "library" before anything else (and use pipeline). Also, is the idea of saving error in header (boolean) good ?

Note: Special thanks to Claus Ibsen who responds to many Camel posts on SO, but also, please try to not just give a simple link on the Apache Camel documentation :) Really, for newbies, Apache Camel documentation sux.

Thanks

user2668735
  • 1,048
  • 2
  • 18
  • 30

1 Answers1

0

Did you read "Camel in Action" book?

This boils down on how big your XML are. If the files are "small" enough (where small depends on how much RAM you have) you don't need to stream them.

I would start with a route like this:

  1. Validate XML (see How can I validate xsd using apache camel? )
  2. Unmarshal to a Java object
  3. Process <library> value
  4. Split on each book (basically a for loop)
  5. Process each book and "remember" in some header if an Exception happened
  6. After the split, if exception happened then raise it again so file component will move to failed folder

Example (not tested code):

<route>
    <from uri="file:yourInputDir" />
    <to uri="validator:file:books/schema.xsd"/>
    <unmarshal>
        <jaxb contextPath="package.of.your.java.pojo" />
    </unmarshal>
    <to uri="bean:libraryProcessor" />
    <split strategyRef="saveExceptionInHeader">
        <simple>${body.getBooksList}</simple>
        <doTry>
            <to uri="bean:processBooks" />
            <doCatch>
                <exception>java.lang.Exception</exception>
                <setHeader headerName="RemeberLastException">
                    <simple>${exception}</simple>
                </setHeader>
            </doCatch>
        </doTry>
    </split>

    <to uri="bean:throwExceptionIfRemeberLastExceptionHeaderPresent" />
</route>

The name of each bean suggest what it does, it should not be a difficult task to implement it.

Add a lot of log statements to get some feedback on what Camel does.

Alessandro Da Rugna
  • 4,571
  • 20
  • 40
  • 64
  • The idea was precisely to not load all in memory at once, but using sax parsing and partial unmarshalling :) – user2668735 Sep 11 '17 at 08:22
  • @user2668765 Use a `streaming` splitter with an XML tokenizer, then unmarshal the single object. See http://www.davsclaus.com/2011/11/splitting-big-xml-files-with-apache.html – Alessandro Da Rugna Sep 11 '17 at 08:49
  • 1
    Thank you, but your link shows an exemple which is too basic (and is too old). The initial need may include a multicast and nested splitters/aggregators. I really expect to see the almost exact answer. It is CRAZY that we can find absolutely nothing on all the Internet on "how to parse an XML" with Apache Camel, and I mean, with a minimum complex XML. I quickly search in Camel in Action, always the same : no real life example. – user2668735 Sep 11 '17 at 10:27
  • @user2668735 I agree with you. There is a lack of "how to handle a huge XML file in Camel", I'll see if I can prepare a more complete example. – Alessandro Da Rugna Sep 11 '17 at 10:41