4

Is there any standard way of controlling concurrency and consumption in an MDB? Every suggestion I've been able to find seem to be application server or resource-adapter specific (e.g. maxSession @ActivationConfigProperty).

The Problem™

We run some relatively heavy analysis jobs on a two-node JBoss setup (EAP 6.3), and want to parallelise jobs as well as impose an upper limit on the number of concurrent jobs so we don't swamp the database server. Jobs are started from a web front-end, and should be started whenever there's a free processing slot - there's no prioritisation or ordering constraints. We use a message queue (IBM WMQ, because of politics) to distribute "start analysis" messages to the nodes.

Current progress

After a lot of fiddling following various suggestions that turned out to be Non-Working Resource-Adapter-Specific Cargo-Cult Misinformation™ :), I thought the problem was solved by defining an EJB pool for the MDB. This does solve the concurrency issue, but unfortunately it seems like a node with no free MDBs still pulls messages of the queue - which can leave one node underutilised and the other fully loaded with a backlog.

If I understand the IBM documentation correctly, this behaviour should be controllable with the readAheadAllowed configuration option, but this doesn't seem to influence my results at all.

So, is there:

  • A Java EE-standard way to control message consumption?
  • A (working) IBM WQ specific configuration option?
  • A JBoss specific way to fix it?
  • Some other smart solution that I haven't thought of?

Alternatively I could probably rework the architecture to use a topic queue, and let each node attempt something along the lines of UPDATE Projects SET Status='inprogress' WHERE Id=42 AND Status='inqueue'; - but I'd rather not go there if I don't have to, mostly because of the Change Requests required for getting the queue changed :)

Arjan Tijms
  • 37,782
  • 12
  • 108
  • 140
snemarch
  • 4,958
  • 26
  • 38

2 Answers2

1

MQ, and indeed JMS, is designed and optimized to get the message as far down the pipe as possible, as fast as possible. The state of each message is tracked by the queue manager but a concurrency requirement would require the QMgr to track the state of messages in relation to one another. MQ does not do that.

There is some granularity in tuning the Activation Spec connection pool and some in the transaction scoping, but these are intended to influence the dynamic behavior of the server and not specify it precisely.

Managing application state across multiple messages is exactly what the ESB products such as IBM Integration Broker are designed to do. Whereas MQ is a transport that looks only at message headers to perform routing and delivery, the ESB looks at message relationships and content. Unless the JMS spec defines a concurrency management API, this division of responsibility with MQ doing transport and the ESB doing processing is not likely to change.

Whether it is implemented in the ESB or JEE code, what you are describing is the Message Dispatcher Pattern from the Enterprise Integration Patterns book, one of the definitive architectural references in the messaging world. There are a couple ways to write this, depending on preferences for instrumentation.

Tokenization

  1. Application instances perform a GET with wait on a token queue and under syncpoint.
  2. On receipt of a token message, the app PUTs the same message back on the token queue, again under syncpoint.
  3. The app performs a GET with wait on the application queue under syncpoint.
  4. On receipt of an application message, the app processes it then performs a COMMIT or ROLLBACK as appropriate.

The app instances compete for the token messages before processing an app message. The result is that app messages are generally processed as fast as they arrive but under load the max concurrency equals the number of token messages.

Generally the token messages are informational, for instance containing a counter that is incremented with each iteration and possibly instance info of the last app to write the token message. This provides some diagnostic insight as to what's going on. In some cases a monitoring app also listens on the token queue to sample that information and write to a dashboard. In that case an extra token message is added to the queue to account for the activity of the monitoring app.

Standalone dispatcher

  1. The dispatcher application listens on the advertised destination queue for messages and on an ACK queue for acknowledgement messages.
  2. A counter compares messages outstanding to {max outstanding message limit} set in the dispatcher app's configuration.
  3. If outstanding messages < limit, messages are moved to the application's input queue and the outstanding message counter incremented.
  4. When the business application GETs a message, it also puts a message to the dispatcher's ACK queue, both actions under syncpoint.
  5. When the business application issues COMMIT the dispatcher receives the ACK and decrements the outstanding messages counter.

The dispatcher app can request Confirmation on Delivery messages but these do not always correlate to the business app successfully processing the transaction. If the business app explicitly puts the ACK messages in the same unit of work as the business message that was consumed the result is rock solid.

T.Rob
  • 31,522
  • 9
  • 59
  • 103
  • While "as far down the pipe as possible, as fast as possible" is fine, it doesn't seem like a super great idea to grab messages off the queue when you don't have execution resources available to do processing. – snemarch Dec 22 '16 at 18:01
  • Right, so don't do it. Read-ahead requires non-persistent messages. When the app architect chooses an at-most-once class of service, buffering the messages in memory produces a significant performance gain in a context where there is by definition no downside to doing so. You asked for a standards-based solution, not an MQ-based one. The behavior is spec-compliant and normally addressed with an architectural pattern, of which you chose the Dispatcher pattern implemented as tokenization with the token stored in the DB rather than a queue. Seems to me we ended up in the exact same place. – T.Rob Dec 22 '16 at 19:09
1

I ended up using my "Alternatively I could [...]" method - changing the "start-analysis" to a broadcast message, and handling node selection with the status clause in the SQL statement.

It's a simple solution that doesn't add a lot of extra complexity, and it has turned out to work very well in practice - it has been running in production for about a year by now.

snemarch
  • 4,958
  • 26
  • 38