Random FabricNotReadableException started happening randomly

Question

[Resolved]

We have a few old services within our cluster and needed to update one where it consumes and processes two extra messages. The messages are built in the exact same way and are being consumed the exact same way.

After the service was running with more then one partition we started seeing random FabricNotReadableExceptions. We spent a long time investigating the issue.

Identifying the problem -

1: Look at a single partition.

2: We saw Node0 being Primary.

3: Node0 became a Secondary, message processing was cancelled.

4: Node1 became a Primary which started consuming and processing messages.

5: For some reason Node0 was still receiving messages on the same partition and throwing exceptions when trying to access Reliable State.

We use the standard Service Fabric Remoting with custom partitioning. This has been working on multiple services so far and never had an issue.

score 0 · Answer 1 · answered Oct 21 '18 at 12:36

How we solved this was by marking the service as ExclusiveProcess. I still can't exactly explain why this fixed it but something was being shared when multiple partitions are running under the same node and service type.

Just wanted to raise this and let others know of a possible solution.

Random FabricNotReadableException started happening randomly

1 Answers1