2

Amazon DynamoDB allows the customer to provision the throughput of reads and writes independently. I have read the Amazon Dynamo paper about the system that preceded DynamoDB and read about how Cassandra and Riak implemented these ideas.

I understand how it is possible to increase the throughput of these systems by adding nodes to the cluster which then divides the hash keyspace of tables across more nodes, thereby allowing greater throughput as long as access is relatively random across hash keys. But in systems like Cassandra and Riak this adds throughput to both reads and writes at the same time.

How is DynamoDB architected differently that they are able to scale reads and write independently? Or are they not and Amazon is just charging for them independently even though they essentially have to allocate enough nodes to cover the greater of the two?

Jeff Walker Code Ranger
  • 4,634
  • 1
  • 43
  • 62

1 Answers1

0

You are correct that adding nodes to a cluster should increase the amount of available throughput but that would be on a cluster basis, not a table basis. The DynamoDB cluster is a shared resource across many tables across many accounts. It's like an EC2 node: you are paying for a virtual machine but that virtual machine is hosted on a real machine that is shared among several EC2 virtual machines and depending on the instance type, you get a certain amount of memory, CPU, network IO, etc.

What you are paying for when you pay for throughput is IO and they can be throttled independently. Paying for more throughput does not cause Amazon to partition your table on more nodes. The only thing that cause a table to be partitioned more is if the size of your table grows to the point where more partitions are needed to store the data for your table. The maximum size of the partition, from what I have gathered talking to DynamoDB engineers, is based on the size of the SSDs of the nodes in the cluster.

The trick with provisioned throughput is that it is divided among the partitions. So if you have a hot partition, you could get throttling and ProvisionedThroughputExceededExceptions even if your total requests aren't exceeding the total read or write throughput. This is contrary to what your question ask. You would expect that if your table is divided among more partitions/nodes, you'd get more throughput but in reality it is the opposite unless you scale your throughput with the size of your table.

  • I do understand how provisioning works and it is not contrary to what I said. I understand that the provisioned throughput is divided among the partitions. Do you have a source for you statement that "Paying for more throughput does not cause Amazon to partition your table on more nodes." That doesn't make sense because I could have a table of 1000 rows but provision so many reads that no single partition could handle that load. I also understand that in DynamoDB the cluster is a shared resource, that has nothing to do with my point about how many nodes a table is partitioned across. – Jeff Walker Code Ranger Aug 06 '14 at 17:25
  • You are absolutely correct. You could have a table of 1000 rows and have the maximum throughput you can provision for that table but still be limited by network and SSD throughput for the physical node that stores that table. The source for my information is working with DynamoDB to understand performance problems we have for a very large DynamoDB table (80+ billion rows, 10+ TBs of data). – Chris Parrinello Aug 06 '14 at 17:39