How to make QLDB partitioning effective?

Question

I have to store the transaction data for certain accounts in QLDB, is there any way I can make the partitioning such a way that QLDB store the data for one account on the same server so that my chunking/querying will be faster?

Hi Shital. What do you mean by partitioning? QLDB in late 2019 does not support partitioning. — Marc, Dec 06 '19 at 06:47
Hey @Marc I mean when QLDB stores data in a ledger, is there any mechanism where I can make my select queries effective? Like for instance, if I have transaction data for multiple accounts, can I control data of one account reside on same server? — Shital, Dec 07 '19 at 07:09

Matthew Pope · Accepted Answer · 2019-12-09T19:12:47.530

1

QLDB is serverless. AWS is intentionally abstracting away the servers, and even if you know the implementation details today, it could change at any time, without warning.

When QLDB eventually supports multiple partitions/shards/strands, even then, you should not be trying to think about this in terms of servers because there’s no guarantee that one partition "key" will exist on exactly one partition/server.

To help explain what I mean, I'll compare to DynamoDB. (Even though I know they do not have the same design, I think it is still useful to make a point.) When using DynamoDB, your data is partitioned based on the hashkey, but there is no guaranteed that all the data for a single hashkey value exists in one partition. QLDB could do something similar—or it might not—but one should not make any assumptions about the underlying implementation when designing a table.

That being said, you can improve the query performance when getting data for one account if you create an index on your accountId field.

edited Dec 09 '19 at 19:12

answered Dec 07 '19 at 20:00

Matthew Pope

7,212
1
28
49

I think this is misleading, but that could just be me reading it wrong. QLDB is serverless when it comes to processing transactions. Your actual state (the journal and indexed storage) is never "deprovisioned". It exists, always, in multiple copies on physical hosts. And, all your data in your ledger, no matter what table/index, lives on the same hosts because QLDB in late 2019 does not support partitioning. I hope that clears it up. – Marc Dec 09 '19 at 18:59
I know that server-less products still run on servers, but the point of a server-less offering is the abstract away the concerns of running servers. Right now, all the data may be on the same host, but there is no guarantee that will always be true, so people shouldn't try to design their tables based on that assumption. I'll try to make my answer a little more clear, but feel free to also edit it with corrections and/or clarifications. – Matthew Pope Dec 09 '19 at 19:03
I think I understand your point, thank you for clarifying. The QLDB architecture makes a hard split on reads vs writes. In a traditional DB, colocating data for reads is highly desirable. In QLDB, it is almost always going to be better to have tables and indexes on different nodes because, quite simply, there are more IOs available. I don't foresee performance regressions if different read partitionings were applied transparently. Writes go to the Journal, which is partitioned in a different way (by strands). – Marc Dec 09 '19 at 21:53

score 0 · Answer 2 · answered Dec 10 '19 at 01:12

The 2019 release of QLDB has no form of partitioning. All your data is written to a single "strand" (partition) of the Journal. There are multiple copies of your data in indexed storage, but each of these nodes has all the data (for all your tables and indexes).

How to make QLDB partitioning effective?

2 Answers2