6

I would like to be able to filter a pagination result using query operation before the limit is taken into consideration.Is there any suggestion to get right pagination on filtered results?

I would like to implement a DynamoDB Scan OR Query with the following logic:

Scanning -> Filtering(boolean true or false) -> Limiting(for pagination)

However, I have only been able to implement a Scan OR Query with this logic:

Scanning -> Limiting(for pagination) -> Filtering(boolean true or false)

Note: I have already tried Global Secondary Index but it didn't work in my case Because I have 5 different attributes to filter and limit.

Taha Farooqui
  • 637
  • 10
  • 25

1 Answers1

8

Unfortunatelly DynamoDB is not capable to do this, once you do Query on one of your indexes, it will read every single item that satisfies your partition and sort key.

Lets check your example - You have boolean and you have index over that field. Lets say 50% of items are false and 50% are true. Once you search by that index you will read through 50% of all items in table (so its almost like SCAN). If you set up limit, it will read only that number of items and then it stops. You cannot use the combination of limit and skip/page/offset like in other databases.

There is some level of pagination https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.Pagination.html but it does not allow you to jump to i.e. page 10, it only allows you go through all the pages one by one. Also I am not sure how it is priced, maybe internally the AWS will go through all the items before preparing the results for you, so you will pay for reading 50% of whole table even if you stop iterating before you reach the end.

There is also the limitation that index can have maximum of 2 fields (partition, sort).


EXAMPLE

You wrote that you have 5 parameters you want to query. The workaround that is used to address these limitations is to create and manage extra fields that have combination of parameters you want to query. Lets say you have table of users and you have there gender, age, name, surname and position. Lets say its huge database, so you have to think about amount of data you can load. Then if you want to use DynamoDB, you have to think about all queries you want to do.

You most likely want to search by name and surname, so you create index with surname as partition key and name as sort key (in such case you can search by surname or by both surname and name). It can work for lot of names, but you found out that some name combinations are too common and you need to filter by position as well. In such case, you create new field (column) called i.e. name-surname and whenever you create or update item, you will need to handle this field in your app to make sure it contains both of it, i.e. will-smith. Then you can make another index, that has name-surname as partition key and position as sort key. Now you can use it for such searches.

However you found out, that for some name-surname-position combination you get too many results and you dont want to handle it on application level and you want to limit results by age as well. Then you can create index with name-surname-position as partition key and age as sort key. At this moment you can also figure out that your old name-surname field and index can be removed as it server no purposes anymore (name and surname are handled by another index and for searching just name-surname-position you can use this index)

You want to query by gender as well sometimes? Its probably better to handle that in application level (or extra filter in db query) rather than creating new index that must be handled and payed for. There are only two types of gender (ok, lets say there exists more, but 99% of people will have just male or female) so its probably cheaper to just hide few fields on application level if someone wants to check only male/female/transgenders..., but load all of them. Because for extra index you would have to pay for every single insert, but this filter will be used only from time to time. Also when someone searches already by name, surname and position you dont expect that much results anyway, so if you get 20 (all genders) or just 10 (male only) results does not make much difference.


This ^^ was just example of how you can think and work with DynamoDB. How exactly you use it depends on your business logic.

Very important note: DynamoDB is very simple database that can only do very simple queries. It has little more functionality than Redis but a lot less functionality than traditional databases. The valid result of thinking about your business model/use-cases is that maybe you should NOT use the DynamoDB at all, because it can simply not satisfy your needs and queries.

Some basic thinking can look like this:

  • Is key-value persistant storage enough? Use DynamoDB
  • Is key-value persistant storage, where one item can have multiple keys and I can search and filter by maximum of 2 fields enough? Use DynamoDB
  • Is persistant storage, where I want to search single Table/Collection by many multiple keys with lot of options enough? Use MongoDB
  • Do I need to search through multiple tables or do complex joins or need transactions? Use traditional SQL database
libik
  • 22,239
  • 9
  • 44
  • 87
  • Interesting!! It means, I should go for Amazon Aurora or MySql Database. Basically, I have expected millions of data can easily manage using DynamoDB. – Taha Farooqui Apr 10 '20 at 22:41
  • 1
    @TahaFarooqui - DynamoDB can easily manage millions of items and with on-demand pricing you dont have to think about the load, it scales automatically, you just pay for number of reads and writes (which is huge difference to other databases, where you usually cannot scale up/down easily and you have to overprovision all the time). But you need business model/design/architecture that is capable to work with query limitations of dynamodb. – libik Apr 10 '20 at 22:48
  • What do you think about Amazon Aurora, Will it work fine when there are millions of users use the web application at the same time. There are lot of filters in my web app. – Taha Farooqui Apr 10 '20 at 23:10
  • 1
    @TahaFarooqui - did not use that one, not sure what are the specifics. You should google some comparision like this one https://stackoverflow.com/questions/46401830/amazon-rds-aurora-vs-rds-mysql-vs-mysql-on-ec2 (note that accepted answer is 3 years old) and some benchmarks. In general any highly used database where you can use indexes that can handle everything you are querying for is "good enough". Then you can check some specifics and comparisions and select the one that suits you the best. – libik Apr 11 '20 at 00:04
  • 1
    @TahaFarooqui - I can see from stack overflow that Aurora is not much used because almost no one is asking about that https://stackoverflow.com/questions/tagged/aws-aurora https://stackoverflow.com/questions/tagged/amazon-rds-aurora - not even 800 questions. If you are new to this and not sure what to pick - pick something that is highly used, then you will find a lot of tutorials and anyone can help you on stack overflow (i.e. mysql, posgresql...) – libik Apr 11 '20 at 00:09