1

we are using Cassandra 1.2.9 + BAM 2.5 for API analysis. We have scheduled a job to do cassandra data purge. This data purge job is divived into three steps. The 1st step is to query the original column family and then insert them into the temporary columnFamily_purge. The 2nd step is to delete from the orinal column family by adding tombstone,and insert the data from columnFamily_purge into the original column family. The 3rd step is to drop the temporary columnFamily_purge

The 1st works well, but the 2nd step frequently crashes the cassandra servers during Hadoop map tasks,which makes Cassandra unavailable.The exception stacktrack is as follows:

2016-08-23 10:27:43,718 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hadoop for UID 47338 from the native implementation
2016-08-23 10:27:43,720 WARN org.apache.hadoop.mapred.Child: Error running child
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.
at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromLBPolicy(HConnectionManager.java:390)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:244)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.deleteRow(AbstractColumnFamilyTemplate.java:173)
at org.wso2.carbon.bam.cassandra.data.archive.mapred.CassandraMapReduceRowDeletion$RowKeyMapper.map(CassandraMapReduceRowDeletion.java:246)
at org.wso2.carbon.bam.cassandra.data.archive.mapred.CassandraMapReduceRowDeletion$RowKeyMapper.map(CassandraMapReduceRowDeletion.java:139)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Could someone help on this what may lead to this problem? Thanks!

Community
  • 1
  • 1
Tom
  • 5,848
  • 12
  • 44
  • 104

2 Answers2

0

This can happen due to 3 reasons.

1) Cassandra servers are down. I don't thing this is the case in your setup.

2) Network issues

3) The load is higher than what cluster can handle.

How do you delete data? Using a hive script?

Bee
  • 12,251
  • 11
  • 46
  • 73
  • Yes, the first two reasons are unlikely. On the BAM UI, we can schedule a cassandra data purge job. Internally, it is using org.wso2.carbon.bam.cassandra.data.archive.mapred.CassandraMapReduceRowDeletion which is a mapreduce job to delete the data. Actually, the data to be deleted are not very big. – Tom Aug 24 '16 at 08:12
  • Hi @Bhathiya,I am kind of new to Cassandra,especially we are using 1.2.9 ,an older version. Do you have some suggestions on how to tune Cassandra since you wso2 guys are adopting Cassandra as the backend NoSQL DB, Did you have some documents on the performance tests in terms of different configuration parameters? – Tom Aug 24 '16 at 11:21
  • I will check on code how purging feature deletes data. Cassandra tuning guide is available for WSO2 MB. You can try that. https://docs.wso2.com/display/MB211/Cassandra+Tuned+Up+Configurations – Bee Aug 24 '16 at 14:34
0

After I increase the number of open files and max thread number,the problem is gone.

Tom
  • 5,848
  • 12
  • 44
  • 104