2

I am working on a ledger with a table of Transactions. Each entry has a transaction_id, account_id, timestamp and other metadata. I need to query for all Transactions for a given account_id with a between operator on timestamp

My planned approach was to build an index on account_id, transaction_id and timestamp. However I have noted a limitation on inequalities and indexes from the AWS Documentation and I had planned applying this to timestamp

Query performance is improved only when you use an equality predicate; for example, fieldName = 123456789.

QLDB does not currently honor inequalities in query predicates. One implication of this is that range filtered scans are not implemented.

...

Warning

QLDB requires an index to efficiently look up a document. Without an indexed field in the WHERE predicate clause, QLDB needs to do a table scan when reading documents. This can cause more query latency and can also lead to more concurrency conflicts.

Transactions would be generated and grow indefinetly over time, and I would need to be able to query a weeks worth of transactions at a time.

Current Query:

SELECT * 
FROM Transactions 
WHERE "account_id" = 'test_account' and "timestamp" BETWEEN `2020-07-05T00:00Z` AND `2020-07-12T00:00Z`

I know it is possible to stream the data to a database more suited for this query, such as dynamodb, but I would like to know if my performance concerns performing the above query is valid, and if it is, what is the recommended indexes and query to ensure this scales and does not result in a scan across all transactions for the given account_id?

Imtiaz
  • 91
  • 2
  • 4

2 Answers2

1

Thanks for your question (well written and researched)!

QLDB, at the time of writing, does not support range indexes. So, the short answer is "you can't."

I'd be interested to know what the intention behind your query is. For example, is getting a list of transactions between two dates something you need to do to form a new transaction or is it something you need for reporting purposes (e.g. displaying a user statement).

Nearly every use-case I've encountered thus far is the latter (reporting), and is much better served by replicating data to something like ElasticSearch or Redshift. Typically this can be done with a couple of lines of code in a Lambda function and the cost is extremely low.

Marc
  • 928
  • 5
  • 8
  • Thank you for clarifying my understanding. The transaction history is **not** required to form a new transaction, and the use-case is for creating statements. – Imtiaz Jul 13 '20 at 05:11
  • I would like to see how far I can push QLDB so that I understand it better and can make more informed decisions. I have come up with two design patterns that I would like input on. 1. Include a week number that is indexed and query by week. 2. Have a **TransactionLog** table that has a single entry per account. On every transaction update, replace this row with the new transaction. Thereafter use History to query the log by document ID and date range. The oldest row from history might need to be filtered out if the transaction time is out of the date range. Thoughts on these patterns? – Imtiaz Jul 13 '20 at 05:17
  • I think you're not going to "have legs" on that solution. First, history queries aren't simply SQL BETWEEN queries. It's interval intersection. Second, history uses QLDB's time, not yours. Third, you can't leverage indexes in history. That's just off the top of my head. In general, I would say history() is really meant to be used as an auditing tool (to figure out how data changed) and is not meant to be part of your reporting system. Doing replication with streams is very straightforward and will lead to an unsurprising experience. – Marc Jul 15 '20 at 18:51
0

QLDB has a history() function that works like a charm to generate statements, since you can pass one or two dates as arguments for start and/or end dates.

You see, this is where QLDB gets tricky: when you think of it as a relational database.

The caveat here is that you would have to change your transactions to be updates in the account table rather than new inserts in a different table. This is because, by design, QLDB gives you the ledger of any table. Meaning you can later check all versions of that record and filter them as well.

Here's an example of what a history query would look like in an Accounts table:

SELECT ha.data.* from Accounts a by Accounts_id
JOIN history(Accounts, `2022-04-10T00:00:00.000Z`, `2022-04-13T23:59:59.999Z`) ha 
ON ha.metadata.id = Accounts_id
WHERE a.account_id = 1234

This different segment by Accounts_id is what QLDB uses to get the index on your history table and how you can join both tables by an indexed column. In this case, account_id.

Vic Wolk
  • 166
  • 1
  • 14