1

I'm trying to insert a single row which has few columns of size 500MB to cassandra cluster and i'm getting below error.

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: [/10.138.90.207:9042, /10.138.90.208:9042, /10.138.90.191:9042, /10.138.90.240:9042, /10.138.90.232:9042, /10.138.90.205:9042, /10.138.90.236:9042, /10.138.90.246:9042] - use getErrors() for details)
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at com.tcs.asml.cassandra.Crud.Insert(Crud.java:44)
at com.tcs.asml.factory.PartToolInsert.main(PartToolInsert.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: [/10.138.90.207:9042, /10.138.90.208:9042, /10.138.90.191:9042, /10.138.90.240:9042, /10.138.90.232:9042, /10.138.90.205:9042, /10.138.90.236:9042, /10.138.90.246:9042] - use getErrors() for details)
at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

When I print get errors in exception, it shows Time out during read error for all nodes in the cluster.

Get errors: 
{/10.138.90.207:9042=com.datastax.driver.core.exceptions.DriverException: Timeout during read, /10.138.90.191:9042=com.datastax.driver.core.exceptions.DriverException: Timeout during read, /10.138.90.208:9042=com.datastax.driver.core.exceptions.DriverException: Timeout during read, /10.138.90.240:9042=com.datastax.driver.core.exceptions.DriverException: Timeout during read, /10.138.90.232:9042=com.datastax.driver.core.exceptions.DriverException: Timeout during read, /10.138.90.205:9042=com.datastax.driver.core.exceptions.DriverException: Timeout during read, /10.138.90.236:9042=com.datastax.driver.core.exceptions.DriverException: Timeout during read, /10.138.90.246:9042=com.datastax.driver.core.exceptions.DriverException: Timeout during read}

Cluster details:

  • one datacenter with 8 nodes each of 16GB RAM
  • Single hard disc in every node.
  • All nodes are connected with 10mbps bandwidth with default latency.

I tried to increase read time out using below command.

cluster.getConfiguration().getSocketOptions().setReadTimeoutMillis(60000);

Below are yaml configuration using now.

  • memtable total space: 4Gb
  • Commit log segment size: 512MB
  • read_request_timeout_in_ms (ms): 10000
  • request_timeout_in_ms (ms): 10000
  • concurrent reads: 32
  • concurrent writes: 32

I faced same issue while i'm trying to insert 250mb row and by setting read time out to 30 seconds fixed the issue.

cluster.getConfiguration().getSocketOptions().setReadTimeoutMillis(30000);

But for 500MB row size its not working.

Can anyone please give me some ideas how to tune cassandra to insert single row with huge data.

Thanks.

Naveen
  • 425
  • 12
  • 28

1 Answers1

0

Question: Why do you need to store 500MB or 200MB of data in a row in cassandra? The sweet spot for partition sizes in cassandra is up to 100MB, maybe a few hundred. Cassandra's a data store for fast storage and fast querying. 500MB of data won't give you either. So why use cassandra for this?

ashic
  • 6,367
  • 5
  • 33
  • 54
  • Ashic, I agree with your point. I'm trying to find out how much time Cassandra is taking to write depending on the row size and bandwidth of the client, coordinator and target nodes. – Naveen Aug 21 '14 at 15:01
  • 1
    If the size of your row follows a Poisson distribution, you're bound to have a few very large rows. Responding by asking why they need to store that much data in a row misses the point. – Flavien Dec 06 '14 at 11:40