Need Opinions / Perspectives - Pagination approach for large data set

Question

We have a requirement where we need to show a lot of data in multiple grids & also provide the option to sort at the UI side. There are 2 approaches:

Load everything in UI & have UI side pagination & sorting.
Load server side paginated data in UI & if user clicks sorting based on any other column, recall the API to re-index the data on sorted column & send back the results again in paginated fashion.

The general feeling is that with approach 1, UI would be unnecessarily loaded with extreme volumes of data (like 10k records across grids - ranging from 1-2 MB) & might cause UI performance issues - not to forget servers serving those requests for large user group close to a million. With approach 2, every time the user clicks sorting, there is an API call & the server resources are wasted for re-sorting the huge data (where the user would care only to see few 10's of records)

What is the best way to handle this kind of scenario?
Is there any industry standard practices where we can refer?
How do we quantify the UI performance?

Brendan · Answer 1 · 2019-01-29T08:38:40.083

There's a third approach:

The server has a different index for each possible sort order. When new data is added, it's inserted in the right place in each index. The UI for each user asks the server for "entries N*K to (N=1)*K" of the index that corresponds to whichever sort order the user selected. There is no sorting. There is no need to load everything into each UI.

Note 1: You can probably cheat a little - e.g. if you have an index for "sorted alphabetically in ascending order" then you can use the same index for "sorted alphabetically in descending order". In this way you might only need 4 indexes for 8 possible sort orders.

Note 2: You can probably cheat more. Rather than having one index for each sort order, you can split the data into "buckets" and have an index for each bucket for each sort order. E.g. instead of one index for "sorted alphabetically in ascending order" you could have one index for "starts with A", another index for "starts with B", ... In the same way, instead of one index for "sorted chronologically" you could have one index for this year, one index for last year, ... This helps to speed up insertion costs (when new data is added), and could allow you improve the UI (e.g. little "skip to bucket" buttons users can use).

Is there any industry standard practices where we can refer?

The industry standard practice depends on which industry. Far too many things are shifting to "web apps", where the industry standard practice is to get incompetent developers working for below minimum wage to slap together a piece of trash using extremely inefficient frameworks.

How do we quantify the UI performance?

I'd use response times (the time it takes for the app to start and show the user data, the time it takes to show data after scrolling/moving to a different page, the time between the user clicking on a different "sort order" button and the screen showing the dat ain the new sort order, etc).

Thanks for the response Brendan. Are you suggesting that we keep all the sorted indexes in the server before even the user asks for it? (considering the data is huge already). Also we don't see concurrent inserts when this is done as this is more of reporting application. We do have an export data option but we also need to support UI display as well. The issue is that it is all in micro-service & it is really proving to be costly whenever a new sorting column comes in to the server as this is a huge dataset & there are multiple more users who keep getting added by time using this application. — Karthik, Feb 19 '19 at 10:43
@Karthik: I'm mostly suggesting that there are more than 2 options. The best solution depends on the specifics of the problem; including things like the ratio of reads vs. modifications, how powerful server and clients are, how many requests per second server has to deal with, and about 100 other things. I wouldn't even assume one server is enough to satisfy requirements (maybe you need to distribute load across a pool of servers dynamically, so the service continues working when one of the servers are down). — Brendan, Feb 19 '19 at 20:44

Need Opinions / Perspectives - Pagination approach for large data set

1 Answers1