0

We have a cluster with 7 nodes and we use the datastax java driver to connect to the cluster. The problem is that I am getting constant NoHostAvailableException like this:

Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /172.31.7.243:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)), /172.31.7.245:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)), /172.31.7.246:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)), /172.31.7.247:9042, /172.31.7.232:9042, /172.31.7.233:9042, /172.31.7.244:9042 [only showing errors of first 3 hosts, use getErrors() for more details])

All the nodes are up:

UN  172.31.7.244  152.21 GB  256     14.5%  58abea69-e7ba-4e57-9609-24f3673a7e58  RAC1
UN  172.31.7.245  168.4 GB   256     14.5%  bc11b4f0-cf96-4ca5-9a3e-33cc2b92a752  RAC1
UN  172.31.7.246  177.71 GB  256     13.7%  8dc7bb3d-38f7-49b9-b8db-a622cc80346c  RAC1
UN  172.31.7.247  158.57 GB  256     14.1%  94022081-a563-4042-81ab-75ffe4d13194  RAC1
UN  172.31.7.243  176.83 GB  256     14.6%  0dda3410-db58-42f2-9351-068bdf68f530  RAC1
UN  172.31.7.233  159 GB     256     13.6%  01e013fb-2f57-44fb-b3c5-fd89d705bfdd  RAC1
UN  172.31.7.232  166.05 GB  256     15.0%  4d009603-faa9-4add-b3a2-fe24ec16a7c1  RAC1

but two of them have high cpu load, especially the 232 because I am running a lot of deletes using cqlsh in that node.

I know that deletes generate tombstones, but with 7 nodes in the cluster I do not think is normal that all the host are not accesible.

Our configuration for the java connection is:

com.datastax.driver.core.Cluster cluster = null;
        //Get contact points
        String[] contactPoints=this.environment.getRequiredProperty(CASSANDRA_CLUSTER_URL).split(",");
        cluster = com.datastax.driver.core.Cluster.builder()
            .addContactPoints(contactPoints))
            .withCredentials(this.environment.getRequiredProperty(CASSANDRA_CLUSTER_USERNAME), 
                this.environment.getRequiredProperty(CASSANDRA_CLUSTER_PASSWORD))
                .withQueryOptions(new QueryOptions()
                .setConsistencyLevel(ConsistencyLevel.QUORUM))
                .withLoadBalancingPolicy(new TokenAwarePolicy(new RoundRobinPolicy()))
                .withRetryPolicy(new LoggingRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE))
                .withPort(Integer.parseInt(this.environment.getRequiredProperty(CASSANDRA_CLUSTER_PORT)))
                .build();

        Metadata metadata = cluster.getMetadata();
        for ( Host host : metadata.getAllHosts() ) {
            LOG.info("Datacenter: "+host.getDatacenter()+"; Host: "+host.getAddress()+"; DC: "+host.getDatacenter()+"\n");
        }

and the contact points are:

172.31.7.244,172.31.7.243,172.31.7.245,172.31.7.246,172.31.7.247

Anyone knows how I can solve this problem? Or at least have anyone some hint about how to deal with this situation?

Update: If I get the error messages withe.getErrors() I obtain:

/172.31.7.243:9042=com.datastax.driver.core.OperationTimedOutException: [/172.31.7.243:9042] Operation timed out, /172.31.7.244:9042=com.datastax.driver.core.OperationTimedOutException: [/172.31.7.244:9042] Operation timed out, /172.31.7.245:9042=com.datastax.driver.core.OperationTimedOutException: [/172.31.7.245:9042] Operation timed out, /172.31.7.246:9042=com.datastax.driver.core.OperationTimedOutException: [/172.31.7.246:9042] Operation timed out, /172.31.7.247:9042=com.datastax.driver.core.OperationTimedOutException: [/172.31.7.247:9042] Operation timed out}

UPDATE:

  • The replication factor of the keyspace is 3.
  • For the deletes Im running them using different files with the cql queries:

    cqlsh ip_node_1 -f script-1.duplicates cqlsh ip_node_1 -f script-2.duplicates cqlsh ip_node_1 -f script-3.duplicates ...

  • I am not specifying any consistency level, so is using the default one which is ONE.

  • Each of the previous files contain deletes like this:

DELETE FROM keyspace_name.search WHERE idline1 = 837 and idline2 = 841 and partid = 8558 and id = 18c04c20-8a3a-11e5-9e20-0025905a2ab2;

  • And the column family is:

CREATE TABLE search ( idline1 bigint, idline2 bigint, partid int, id uuid, field3 int, field4 int, field5 int, field6 int, field7 int, field8 int, field9 double, field10 bigint, field11 bigint, field12 bigint, field13 boolean, field14 boolean, field15 int, field16 bigint, field17 int, field18 int, field19 int, field20 int, field21 uuid, field22 boolean, PRIMARY KEY ((idline1, idline2, partid), id) ) WITH bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='Table with the snp between lines' AND dclocal_read_repair_chance=0.000000 AND gc_grace_seconds=0 AND index_interval=128 AND read_repair_chance=0.100000 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'};

CREATE INDEX search_partid ON search (partid);

CREATE INDEX search_field8 ON search (field8);

UPDATE (18-03-2016):

After the deletes start to be executed I found the cpu of some of the nodes increases a lot:

enter image description here

I check the processes on that nodes and only cassandra is running but consuming a lot of cpu. The rest of the nodes are not using almost cpu.

UPDATE (04-04-2016): I do not know if it is related. I check the nodes which a lot of CPU (near 96%) and th gc activity remains on 1.6% (using only 3 GB from the 10 which have assigned).

Checing the thread pool stats:

nodetool tpstats Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 20042001 0 0 RequestResponseStage 0 0 149365845 0 0 MutationStage 32 117720 181498576 0 0 ReadRepairStage 0 0 799373 0 0 ReplicateOnWriteStage 0 0 13624173 0 0 GossipStage 0 0 5580503 0 0 CacheCleanupExecutor 0 0 0 0 0 AntiEntropyStage 0 0 32173 0 0 MigrationStage 0 0 9 0 0 MemtablePostFlusher 0 0 45044 0 0 MemoryMeter 0 0 9553 0 0 FlushWriter 0 0 9425 0 18 ValidationExecutor 0 0 15980 0 0 MiscStage 0 0 0 0 0 PendingRangeCalculator 0 0 7 0 0 CompactionExecutor 0 0 1293147 0 0 commitlog_archiver 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 273 0 0

Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 PAGED_RANGE 0 BINARY 0 READ 0 MUTATION 0 _TRACE 0 REQUEST_RESPONSE 0 COUNTER_MUTATION 0

I realize that the pending mutation stages are growing but the active value remain the same, could be this the problem?

ftrujillo
  • 1,172
  • 1
  • 16
  • 30
  • Can you show the keyspace where you perform the deletes on? What replication factor are you using? – HashtagMarkus Mar 15 '16 at 09:41
  • An example of your delete statement might help too. – phact Mar 15 '16 at 13:49
  • What is the consistency level of your delete operation? – Rahul Mar 15 '16 at 15:22
  • I have updated the description of the question with all the requested information in the comments – ftrujillo Mar 16 '16 at 09:57
  • How many of those deletions do you perform per partition? So, for example how many deletes do you perform for elements on "idline1 = 837 and idline2 = 841 and partid = 8558"? How many rows do you store per partition? Also, is the index you created on another table (snpsearch?) or is this a copy/paste mistake and it belongs to search? – HashtagMarkus Mar 16 '16 at 14:12
  • the index belongs to the same column family (it is a copy/paste error). The number of deletes per partition could vary because I am not deleting the complete partition, only part of them. The maximum number of rows per partition is 10000. – ftrujillo Mar 16 '16 at 16:35

1 Answers1

0

I see two problems with your datamodel.

  • You use two secondary indexes. One is on a field on the partition key. I don't know how cassandra behaves in this case. Worst case is, that even if you use the complete partition key (like you do in your example delete) cassandra does a lookup in the secondary index. In that case this would mean a full cluster scan, because secondary indexes are only stored per partition. Since only a part of the partition key is indexed cassandra does not know on which partition the index informations lies. This behavior at least would explain the timeouts.

  • You said, you delete a lot of rows in a specific partition. That is also a problem. For each deletion cassandra creates a tombstone. The more tombstones there are, the slower the read will become. This will sooner or later lead to timeouts or exceptions (I believe cassandra will write warnings when 1000 tombstones are reached and throw exceptions when 10.000 tombstones are reached). Btw. these tombstones are also created in the secondary index. By default cassandra will remove tombstones after gc_grace_seconds (by default 10 days) when a compaction is performed. You could change this property per table. More information on these table properties can be found here: Table Properties

I believe the first point could be the reason for the timeouts.

HashtagMarkus
  • 1,641
  • 11
  • 20
  • Respect to the index, I will delete the index and check if that improves. It was an error in the model, we first think to use seconday index and finally we decide to include it in the partition key. Respect to the tombstones, we have configured the gc_grace_seconds for these cf to zero to avoid generate tombstones during the deletes. – ftrujillo Mar 17 '16 at 10:32
  • @ftrujillo Just a sidenote: The tombstones still will be generated, but they will be deleted with each compaction. – HashtagMarkus Mar 17 '16 at 12:16
  • @hashtagmarkusThanks for the clarification. Anyway I have also tried to run compactions after a few of deletes have been executed precisely to delete the tombstones before continuing deleting results. – ftrujillo Mar 17 '16 at 13:36
  • I have delete the index and start executing again the deletes and after a few minutes I have the same problem. Updated my question. – ftrujillo Mar 18 '16 at 14:41