We are using 2 node cratedb cluster (v2.3.4). It was running fine for more than a month without any issues. Recently we came to know that one node went away without any external interference. We are unable to find the Root cause for this incident.
Below are the logs. Please help.
Apr 12 23:47:04 STATS-DB-M crate[162556]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
Apr 12 23:47:04 STATS-DB-M crate[162556]: at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Apr 12 23:47:04 STATS-DB-M crate[162556]: [2018-04-12T23:47:04,027][WARN ][o.e.c.a.s.ShardStateAction] [crate3] [online_dlr_report_cache_20180412][7] received shard failed for shard id [[online_dlr_report_cache_20180412][7]], allocation id [NahsM0yfRPaHA5waOpu5OA], primary term [2], message [mark copy as stale]
Apr 12 23:47:04 STATS-DB-M crate[162556]: [2018-04-12T23:47:04,027][WARN ][o.e.c.a.s.ShardStateAction] [crate3] [online_dlr_report_cache_20180412][1] received shard failed for shard id [[online_dlr_report_cache_20180412][1]], allocation id [haMsWkQGTe-yTIfGSkLbHw], primary term [2], message [mark copy as stale]
Apr 12 23:47:04 STATS-DB-M crate[162556]: [2018-04-12T23:47:04,026][WARN ][o.e.c.a.s.ShardStateAction] [crate3] [online_dlr_report_cache_20180412][1] received shard failed for shard id [[online_dlr_report_cache_20180412][1]], allocation id [ZfHGc1DiTZmJ2JQ3YoA_Yg], primary term [1], message [failed to perform indices:crate/data/write/upsert on replica [online_dlr_report_cache_20180412][1], node[1RRQy42EQ8meT7S40loaEw], [R], s[STARTED], a[id=ZfHGc1DiTZmJ2JQ3YoA_Yg]], failure [RemoteTransportException[[crate3][192.168.1.50:4300][indices:crate/data/write/upsert[r]]]; nested: IllegalStateException[active primary shard cannot be a replication target before relocation hand off [online_dlr_report_cache_20180412][1], node[1RRQy42EQ8meT7S40loaEw], [P], s[STARTED], a[id=ZfHGc1DiTZmJ2JQ3YoA_Yg], state is [STARTED]]; ]
Apr 12 23:47:04 STATS-DB-M crate[162556]: org.elasticsearch.transport.RemoteTransportException: [crate3][192.168.1.50:4300][indices:crate/data/write/upsert[r]]
Apr 12 23:47:04 STATS-DB-M crate[162556]: Caused by: java.lang.IllegalStateException: active primary shard cannot be a replication target before relocation hand off [online_dlr_report_cache_20180412][1], node[1RRQy42EQ8meT7S40loaEw], [P], s[STARTED], a[id=ZfHGc1DiTZmJ2JQ3YoA_Yg], state is [STARTED]
Apr 12 23:47:10 STATS-DB-M systemd[1]: crate.service: main process exited, code=exited, status=126/n/a
Apr 12 23:47:10 STATS-DB-M systemd[1]: Unit crate.service entered failed state.
Apr 12 23:47:10 STATS-DB-M systemd[1]: crate.service failed.