1

select distribution ratio: The ratio of rows each partition should insert as a proportion of the total possible rows for the partition (as defined by the clustering distribution columns). default FIXED(1)/1

can someone explain what this means? and why this it is called select distribution ration when it is under insert distribution?

http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema

user1870400
  • 6,028
  • 13
  • 54
  • 115

1 Answers1

3

In cassandra, data is assigned to a given node by the partition key, and then stored sorted on disk based on the clustering key within the partition.

The 'distribution ratio' allows you to define:

1) How many rows the stress tool will create in each partition,

2) How many rows the stress tool will read from each partition (they'll be ordered, so it's fairly fast to grab more than one)

In the case of FIXED(), that means each partition will have the FIXED number of rows - if you choose some of the other options, you'll end up with a variable number of rows.

Edit to explain multiple rows per partition:

For example, if you had a data model where you gathered weather information from different cities:

CREATE TABLE sensor_readings (
station_id text,
weather_time timestamp,
temperature int,
humidity int,
PRIMARY KEY(station_id, weather_time)); 

In this case, you have multiple rows (one for each weather_time) in each partition (station_id). You can query for all sensor readings in a given station_id, or you can query for only one specific weather_time. The distribution ratio controls how many weather_times you have per station_id.

Jeff Jirsa
  • 4,391
  • 11
  • 24
  • what do you mean by how many rows per partition? I was under an assumption that each partition means just one row. can you give me an example – user1870400 Jan 21 '16 at 10:14