Batch processing and functional programming

Question

As a Java developer, I'm used to use Spring Batch for batch processing, generally using a streaming library to export large XML files with StAX for exemple.

I'm now developping a Scala application, and wonder if there's any framework, tool or guideline to achieve batch processing.

My Scala application uses the Cake Pattern and I'm not sure how I could integrate this with SpringBatch. Also, I'd like to follow the guidelines described in Functional programming in Scala and try to keep functional purity, using stuff like the IO monad...

I know this is kind of an open question, but I never read anything about this...

Has anyone already achieved functional batch processing here? How was it working? Am I supposed to have a main that creates a batch processing operation in an IO monad and run it? Is there any tool or guideline to help, monitor or handle restartability, like we use Spring Batch in Java. Do you use Spring Batch in Scala? How do you handle the integration part, for exemple waiting for a JMS/AMQP message to start the treatment that produces an XML?

Any feedback on the subjet is welcome

Is there a good intro somewhere to explain what Spring Batch is doing? It's fairly difficult for me to understand the website. — J. Abrahamson, Oct 26 '13 at 19:13
I don't think "purity" is a desirable goal (w.r.t. anything other than chemistry). Using types that are publicly immutable is of great practical value. Expunging all mutable data from you code is not. — Randall Schulz, Oct 26 '13 at 21:18
@RandallSchulz I'm clearly speaking without actually knowing what the goal of this is, but I'm happy to broadly argue that purity can be a very useful property to uphold if possible. It's not right for all computations obviously, but I find it incredibly useful to reason from "purity as default" and only add impurities as they are justifiable. This isn't a place to debate it, but I did want to provide a counter-balance to your opinion. — J. Abrahamson, Oct 26 '13 at 21:58
Functions can mutate data and still be considered pure, provided the mutation can't be observed from elsewhere. In fact the book Sebastien mentions, *Functional programming in Scala*, describes techniques for doing this in a type-safe manner. — Ben James, Oct 26 '13 at 22:04
From the Haskell side this is most clearly expressed via the State or ST monads, each of which being pure observed from the respective `run*` function. — J. Abrahamson, Oct 26 '13 at 22:18
The Haskell `State` monad is internally pure, while `ST` does wrap non-observable mutability. — duplode, Oct 26 '13 at 22:40
It's a matter of semantics, from what perspective we're measuring purity. If you try to see `State s a` as a value of `a`, which is exactly what we're doing when operating "in the state monad" then state is impure. ST is really just State with an extra special s parameter which allows for very interesting behavior. — J. Abrahamson, Oct 27 '13 at 02:07
@duplode: also, the compiler is allowed to use mutability internally pretty much anywhere as an optimisation, as long as it can't be observed from the outside. I'm not sure to what degree GHC does this in code with `State`, but at least there are plenty of rewrite rules in the `lens` library that exploit similar possibilities. — leftaroundabout, Oct 28 '13 at 09:51

score 4 · Accepted Answer · answered Nov 04 '13 at 09:03

You don't mention what kind of app you are developing with Scala, so I'm going to wild guess here and suppose you are doing a server side one. Going further with wild guessing let's say you are using Akka... because you are using it, aren't you? :)

In that case, I guess what you are looking for is Akka Quartz Scheduler, the official Quartz Extension and utilities for cron-style scheduling in Akka. I haven't tried it myself, but from your requirements it seems that Akka + this module would be a good fit. Take into account that Akka already provides hooks to handle restartability of failed actors, and I don't think that it would be that difficult to add monitoring of batch processes leveraging the lifecycle callbacks built into actors.

Regarding interaction with JMS/AMQP messaging, you could use the Akka Camel module, that provides support for sending and receiving messages through a lot of protocols, including JMS. Using this module you could have a consumer actor receiving messages from some JMS endpoint, and fire whatever process you want from there, probably forwarding or sending a new message to the actor responsible for that process. If the process is fired either by a cron style timer or an incoming message you can reuse the same actor to accomplish the task.

Batch processing and functional programming

1 Answers1