1

Can anyone let me know, why 2PC is blocking when coordinator fails? Is it because the cohorts don't employ timeout concept in 2PC?

Good reference: Analysis and Verification of Two-Phase Commit & Three-Phase Commit Protocols, by Muhammad Atif,

KGhatak
  • 6,995
  • 1
  • 27
  • 24

2 Answers2

1

The two-phase is blocking protocol because when the participants enter the prepared phase they have to wait for the coordinator to decide what is the next step of processing. When coordinator fails they have to wait till it's resurrected. It's not possible to start another coordinator to reach a result. Participants are not permitted to change their state until they're commanded to do so.

I can understand you compare 3PC vs. 2PC. Thus 3PC protocol (as I understand it) is a family of the protocols where few of them exists. The 3PC addresses the issue of blocking nature of 2PC. The main point is consistently finishing the transaction (commit or rollback) only with knowledge of "the environment". It's expected that a new coordinator (backup) is started (probably selected on from participants) and transaction could be finished. There is way to include timeouts to abort the participant after some time. Even that the newly started coordinator should be capable to consistently finish the whole transaction (probably by rollback in such case).

chalda
  • 702
  • 4
  • 18
  • @ chalda : In case of 2PC, why can't the cohorts abort on timeout? This way, they won't be blocked. – KGhatak Jun 22 '17 at 10:47
  • 1
    @KGhatak yes, it can and in practice it's used - the X/Open (XA) spec counts with it - but if prepared cohort aborts on timeout, the protocol can't guarantee the atomic result. The XA uses the well-known heuristic state which declares that the human intervention is needed as the commit ended in a non-deterministic way. The 3PC protocol expects the timeout occurs and should be able handle it automatically. – chalda Jun 26 '17 at 11:33
  • @ chalda : what I gather now is that in theory, a cohort cannot apply abort/commit on timeout since it doesn't guarantee consistency (like you've mentioned in your response) and therefore when a coordinator fails, the cohorts are stuck forever. This is the reason, 2PC is tagged as blocking. However, I can imagine that there may be heuristic or manual intervention to recover in practice. – KGhatak Jun 26 '17 at 15:27
  • 1
    @KGhatak yes, that's what I'm talking about. For further read, I think this is good - https://www.microsoft.com/en-us/research/publication/consensus-on-transaction-commit . It's not directly about 3PC but there is the context involved. – chalda Jun 26 '17 at 20:37
0

2PC doesn't always block when Coordinator fails, a system using 2PC only blocks when Coordinator fails whenever anyone read out a prepared(in-doubt) resource.

If the commit message(of phase 2) to Participant lost, the Participant's resource stays at prepared state, it must refer to Coordinator to check out what exact state the resource is. A Participant could not determine the exact state of a prepared resource itself.

ideawu
  • 2,287
  • 1
  • 23
  • 28