This might result in biased and opinion based answers, if so I'll close the question but...
I have a rather basic requirement of improving our up-time and speed. As part of this I'm looking at the two main competing approaches, traditional pub/sub and akka.net. We don't have any issues currently or expect to have any need for concurrency control.
What we have is several basic workflows which are data analysis, manipulation and persistence of the result:
Step 1) Capture work to be done (IE what objects need to do some work)
Step 2) Execute that work load and produce a result
Step 3) Save result
Using traditional pub/sub This seems rather easy. Have micro services for each step, push a message at the end of each step with the data required (or more to the point data that might be useful) for the next step. Using any off the self message queue/topic/subscription software this provides a nice ability to:
1) geographically spread the loads around the world to where the source data is located
2) increase the number of "workers" that subscribe to increase through put
3) push to something central that can support the idea of connecting "workers" with a minimal learning curve
4) any component (or set of workers for a component) further down the workflow has/have a queue where the messages queue and wait for said component to come back online (even if the whole component disconnects)
5) adding new components that do something new and different, is as easy as registering a new subscription to a topic.
It's all pretty much out of the box easy joy... assuming sensible aggregate and bounded context patterns are adhered to here. I'm not seeking advise of how to write good distributed code, I'm looking for how deploy it, support it, debug rouge/missing/corrupt messages etc. Which is why I want to know what Akka.net offers.
I've seen there's Akka.net clustering . It may or may not be production ready yet, but best I understand what it can/could do for us.
So the main questions I have are:
1) Where are messages stored prior to arriving? So long as a publisher has access to the messaging bus/software endpoint, any such software will store and hold messages waiting for a subscriber to connect and pick up it's messages (obvious assumptions about the subscription having already been registered so the messages queue for it). How does Akka.net cluster handle all of this?
2) What tooling exists for operational support of these queues and mailboxes in Akka.net cluster? What tools give an operator insight into what is in a mailbox received but waiting to be processed and what tools exist for viewing what has been "published" and not yet "received"? Most competing Pub/Sub software has operational tools so I'm looking for some comparison here.
3) How do you debug rouge, missing or corrupt messages. We all know we should trust our software but a bad message can cause a system to spiral out of control, so how would I eject a bad message from the system? How can I modify a message so it's going to behave differently because the business needs something fixed at 3:30 am? How can I answer "where is my message" with "it IS in the system and it IS waiting to be received" or "it has been received and just in the mailbox"?
4) If a component goes down HARD (recycle, hardware failure what ever) what will restore the mailboxes, queues etc? Any message that's actually being processed has an acceptable lost tolerance, but 1000 messages in a mailbox getting lost isn't so tolerable, what persistence and tolerance is there?
5) The light review I've done appears to advocate for a supervisor pattern to be built into your software to marshal messages around (I'm guessing to manage and release concurrency locks?). Given concurrency isn't an issue here, what out of the box pub/sub mechanism do you support that isn't basic message remoting between two (or x internally defined in code) components? Again with subscriptions and topics in most pub/sub software, your first object pushes a message (it's central so it's a potential single point of failure) but that component (and neither doesn't any other code) have to be aware of what will consume that message. It's expansion nirvana compared the old school way where we manually pushed a message from one object to the next (and to the next), rebuilding or recompiling for each new class that same message had to go to. I'm keen to not have to build our own message router.
6) When all instances of a particular component go offline (say step 3 above) what remembers that there's actually something there that needs to queue and remember those messages (say the ones pushed blindly from step 2 above)? In other software, until you delete the subscription the messages keep queuing up based on what ever rules are defined for TTL etc. What is provided for this?