0

How to use the ByteOrderedPartitioner (BOP) to force specific key values to be partitioned according to a custom requirement. I want to force Cassandra to partition and replicate data according to custom requirements, without introducing a custom partitioner how far I can control this behavior and how ?

Overall: I want my data starting with particular ID to be at a predefined node because I know data will be accessed from that node heavily. Also like the data to be replicated to nearby nodes.

Krv Perera
  • 119
  • 2
  • 15

2 Answers2

1

I want my data starting with particular ID to be at a predefined node because I know data will be accessed from that node heavily.

Looks like that you talk about data locality problem, which is really important in bigdata-like computations (Spark, Hadoop, etc.). But the general approach for that isn't to pin data to specific node, but just to move your whole computation to the data itself.

Pinning data to specific node may cause problems like:

  • what should you do if your node goes down?
  • how evenly will the data be distributed among the cluster? Will be there any hotspots/bottlenecks because of node over(under)-utilization?
  • how can you scale your cluster in future?

Moving computation to data has no issues with these questions, but the approach you going to choose - has.

shutty
  • 3,298
  • 16
  • 27
  • Can u give me a example of moving the whole computation logic it seems a good solution. In my current model I am replicating the data to near by nodes as a solution to the problem if the node goes down and overall I am creating **hotspots** by pinning data so the NoSQL data base does not have to find anywhere else (It is a guarantee about the data reading locations). **Scalability** - yes that is going to be a problem. Correct me if I am wrong and if there is a possibility of a better model. – Krv Perera Nov 26 '15 at 12:29
0

Found the answer here... http://www.mail-archive.com/user%40cassandra.apache.org/msg14997.html

Changing the setting "initial_token" in cassandra.yaml file we can let the nodes to be divided into key ranges and partitioning will choose the node which is going to save the first replication of the data and strategy class SimpleStrategy will add the replica to proceeding nodes so by arranging the nodes the way you want you can exploit the replication strategy.

Krv Perera
  • 119
  • 2
  • 15