0

I'm currently testing failure scenarios using 3 cockroachDB nodes.

Using this scenario:

  1. Inserting records in a loop
  2. Shutdown 2 nodes out of 3 (to simulate Quorum lost)
  3. Wait long enough so the Postgres JDBC driver throws a IO Exception
  4. Restart one node to bring back Quorum
  5. Retry previous failed statement

I then hit the following exception

Cause: org.postgresql.util.PSQLException: ERROR: duplicate key value (messageid)=('71100358-aeae-41ac-a397-b79788097f74') violates unique constraint "primary"

This means that the insert succeeded on first attempt (from which I got the IO Exception) when the Quorum became available again. Problem is that I'm not aware of it.

I cannot make the assumption that a "duplicate key value" exception will be cause by application logic issues. Is there any parameters I can tuned so the underlying statement rollbacks before the IO Exception ? Or maybe a better approach ?

Tests were conducted using

  • CockroachDB v1.1.5 ( 3 nodes )
  • MyBatis 3.4.0
  • PostgreSQL driver 42.2.1
  • Java 8
sbrisson
  • 346
  • 1
  • 9

1 Answers1

1

There's a couple things that could be happening here.

First, if one of the nodes you're killing is the gateway node (the one your Java process is connecting to), it could just be that the data is being committed, but the node is dying before it's able to send the confirmation back to the client. In this case, there's not much that can be done by CockroachDB or any other database.

The more subtle case is where the nodes you're killing are nodes besides the gateway node. That is, where the node you were talking to sent you back an error, despite the data being committed successfully. The problem here is that the data is committed as soon as it's written to raft, but it's possible that if the other nodes have died (and could come back up later), there's no way for the gateway node to know whether they have committed the data that it asked them to. In situations like this, CockroachDB returns an "ambiguous result error". I'm not sure how jdbc exposes the specifics of the errors returned to the client in cases like this, but if you inspect the error itself it should say something to that effect.

Ambiguous results in CockroachDB are briefly discussed in its Jepsen analysis, and see this page in the CockroachDB docs for information on the kinds of errors that can be returned.