Pagination of multiple sorted lists

Question

I have an unknown number of ordered lists that I need to do paging on. For example, the pages for these 3 lists should look like this when the page size is 6.

List1: 01,02,03,04,05,06,07,08,09,10
List2: 11,12,13,14,15
List3: 16,17,18,19,20,21,22,23,24,25,26,27,28

Result Pages:

Page1: 01,11,16,02,12,17
Page2: 03,13,18,04,14,19
Page3: 05,15,20,06,21,07
Page4: 22,08,23,09,24,10
page5: 25,26,27,28

What will be the most efficient way to get which items should I take from each list (start index and number of items) when given the page number?

Take in consideration that each list can have a few hundred thousand of items so iterating through all of them will not be efficient.

Thanks @PetSerAl for pointing that out, I corrected it and made it more readable. — Bear.S, Oct 17 '15 at 20:06
Could you clarify what do you mean by **unknown** number of **list**s, in other words, what is the type of the input for the function in the question. For instance, is `IReadOnlyList>` ok? If not, what it should be? — Ivan Stoev, Oct 18 '15 at 22:36
@IvanStoev, function input should be something like this: GetPageItems(List> itemLists, int pageSize, int page) pretty much like Kennnnnnnn's solution. — Bear.S, Oct 19 '15 at 07:07
Ok, basically the same, just list of `object` instead of `int`. — Ivan Stoev, Oct 19 '15 at 07:16

Ivan Stoev · Accepted Answer · 2015-10-21T06:28:18.073

I can't say if it's the most efficient way or not, but here is an algorithm with O(M*Log2(M)) time complexity where M is the number of the lists. It works as follows. The input set is grouped and sorted in ascending order by the item Count, which is iterated until the effective start index fits into current range, skipping the previous ranges. This is possible because at every step we know that it is the minimum count, hence all the remaining lists have items in that range. Once we are done with that, we emit the page items from the remaining lists.

Here is the function:

static IEnumerable<T> GetPageItems<T>(List<List<T>> itemLists, int pageSize, int pageIndex)
{
    int start = pageIndex * pageSize;
    var counts = new int[itemLists.Count];
    for (int i = 0; i < counts.Length; i++)
        counts[i] = itemLists[i].Count;
    Array.Sort(counts);
    int listCount = counts.Length;
    int itemIndex = 0;
    for (int i = 0; i < counts.Length; i++)
    {
        int itemCount = counts[i];
        if (itemIndex < itemCount)
        {
            int rangeLength = listCount * (itemCount - itemIndex);
            if (start < rangeLength) break;
            start -= rangeLength;
            itemIndex = itemCount;
        }
        listCount--;
    }
    if (listCount > 0)
    {
        var listQueue = new List<T>[listCount];
        listCount = 0;
        foreach (var list in itemLists)
            if (itemIndex < list.Count) listQueue[listCount++] = list;
        itemIndex += start / listCount;
        int listIndex = 0;
        int skipCount = start % listCount;
        int nextCount = 0;
        int yieldCount = 0;
        while (true)
        {
            var list = listQueue[listIndex];
            if (skipCount > 0)
                skipCount--;
            else
            {
                yield return list[itemIndex];
                if (++yieldCount >= pageSize) break;
            }
            if (itemIndex + 1 < list.Count)
            {
                if (nextCount != listIndex)
                    listQueue[nextCount] = list;
                nextCount++;
            }
            if (++listIndex < listCount) continue;
            if (nextCount == 0) break;
            itemIndex++;
            listIndex = 0;
            listCount = nextCount;
            nextCount = 0;
        }
    }
}

and test:

static void Main(string[] args)
{
    var data = new List<List<int>>
    {
        new List<int> { 01, 02, 03, 04, 05, 06, 07, 08, 09, 10 },
        new List<int> { 11, 12, 13, 14, 15 },
        new List<int> { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 },
    };
    int totalCount = data.Sum(list => list.Count);
    int pageSize = 6;
    int pageCount = 1 + (totalCount - 1) / pageSize;
    for (int pageIndex = 0; pageIndex < pageCount; pageIndex++)
        Console.WriteLine("Page #{0}: {1}", pageIndex + 1, string.Join(", ", GetPageItems(data, pageSize, pageIndex)));
    Console.ReadLine();
}

Thanks Ivan Stoev, your solution is very fast but after comparing it to @Kennnnnnnn's solution it is slower. Thank you for your time helping me to find a solution. — Bear.S, Oct 20 '15 at 20:33
@Bear.S It depends of the number of the lists and their size. If you test with a few lists with many items, almost equally sized, his implementation will be faster, although doing a lot of unnecessary stuff. I was considering similar implementation but rejected it due to bad worst case performance (many lists with different sizes). I choose the implementation above because it has guaranteed performance. But that's similar to Quick vs Merge sort, so you know better what's your use case. — Ivan Stoev, Oct 20 '15 at 20:44
@Bear.S Still it shouldn't be this way. I've checked the implementation and of course the bottleneck was in Linq call (I know I should totally avoid Linq in performance scenarios). Replaced the Linq part with a simple sorted array and now everything is how it should be - my updated function outperforms Kennnnnnnn's one always, and when increasing the number of lists it starts getting **times** faster. Check it out. — Ivan Stoev, Oct 20 '15 at 22:02
I think you forgot to update your solution, I can still see the old version with the Linq call. — Bear.S, Oct 21 '15 at 06:16
@Bear.S Ha, that's right - it's been a long day :-) See now. — Ivan Stoev, Oct 21 '15 at 06:29
after testing your solution again indeed it performs better! — Bear.S, Oct 21 '15 at 07:20

score 0 · Answer 2 · edited May 23 '17 at 10:27

I think it could be done nicely in two steps:

Flatten your lists to a single list (ordered in the way you describe).
Take items from that list for the desired page.

To accomplish step 1, I'd do something like what was suggested here: Merging multiple lists

So, (assuming your page items are ints, as in your example), here's a nice method that finds exactly the ones you want:

    static IEnumerable<int> GetPageItems(IEnumerable<List<int>> itemLists, int pageSize, int page)
    {
        var mergedOrderedItems = itemLists.SelectMany(x => x.Select((s, index) => new { s, index }))
                                          .GroupBy(x => x.index)
                                          .SelectMany(x => x.Select(y => y.s));

        // assuming that the first page is page 1, not page 0:
        var startingIndex = pageSize * (page - 1);

        var pageItems = mergedOrderedItems.Skip(startingIndex)
                                          .Take(pageSize);
        return pageItems;            
    }

Note - you don't have to worry about passing in a page# that exceeds the total number of pages that could exist given the total item count... Thanks to the magic of Linq, this method will simply return an empty IEnumerable. Likewise, if Take(pageSize) results in less than "pageSize" items, it simply returns the items that it did find.

Hi, @Kennnnnnnn, thank you for the solution, but I think that the first step of making a one big single list using Linq will result iterating through all the items of all lists. I'm looking for a solution that will avoid that. — Bear.S, Oct 17 '15 at 20:10
This also re-computes the entire thing when getting each page, meaning that getting all of the pages is doing n^2 more work than is actually required. — Servy, Oct 19 '15 at 14:49
@Servy - You shouldn't be getting "all" the pages - that's the whole point of paging :) It _should_ recompute each time a page is visited, so that if the backing data changes, a page hit will immediately reflect those changes. The main problem with this one is that all the items are iterated -- so really this solution could only be practical for small lists / few lists. Taking another stab at it -- I've just submitted a second, lower-level solution that addresses that issue. — Kennnnnnnn, Oct 20 '15 at 05:53

Kennnnnnnn · Answer 3 · 2015-10-20T06:00:51.870

I'll submit another implementation, based on Bear.S' feedback on my first answer. This one's pretty low-level and very performant. There are two major parts to it:

Figure out which item should appear first on the page (specifically what is the index of the list that contains it, and what is the item's index within that list).
Take items from all lists, in the correct order, as needed (until we have all that we need or run out of items).

This implementation doesn't iterate the individual lists during step 1. It does use List.Count property, but that is an O(1) operation.

Since we're going for performance here, the code isn't necessarily as self-descriptive as I'd like, so I put in some comments to help explain the logic:

    static IEnumerable<T> GetPageItems<T>(List<List<T>> itemLists, int pageSize, int page)
    {
        if (page < 1)
        {
            return new List<T>();
        }

        // a simple copy so that we don't change the original (the individual Lists inside are untouched):
        var lists = itemLists.ToList();

        // Let's find the starting indexes for the first item on this page:
        var currItemIndex = 0;
        var currListIndex = 0;
        var itemsToSkipCount = pageSize * (page - 1); // <-- assuming that the first page is page 1, not page 0

        // I'll just break out of this loop manually, because I think this configuration actually makes
        // the logic below a little easier to understand.  Feel free to change it however you see fit :)
        while (true)
        {
            var listsCount = lists.Count;
            if (listsCount == 0)
            {
                return new List<T>();
            }

            // Let's consider a horizontal section of items taken evenly from all lists (based on the length of
            // the shortest list).  We don't need to iterate any items in the lists;  Rather, we'll just count 
            // the total number of items we could get from this horizontal portion, and set our indexes accordingly...
            var shortestListCount = lists.Min(x => x.Count);
            var itemsWeAreConsideringCount = listsCount * (shortestListCount - currItemIndex);

            // Does this horizontal section contain at least as many items as we must skip?

            if (itemsWeAreConsideringCount >= itemsToSkipCount) 
            {   // Yes: So mathematically find the indexes of the first page item, and we're done.
                currItemIndex += itemsToSkipCount / listsCount;
                currListIndex = itemsToSkipCount % listsCount;
                break; 
            }
            else
            {   // No: So we need to keep going.  Let's increase currItemIndex to the end of this horizontal 
                // section, remove the shortest list(s), and the loop will continue with the remaining lists:
                currItemIndex = shortestListCount;
                lists.RemoveAll(x => x.Count == shortestListCount);
                itemsToSkipCount -= itemsWeAreConsideringCount;
            }
        }

        // Ok, we've got our starting indexes, and the remaining lists that still have items in the index range.
        // Let's get our items from those lists:
        var pageItems = new List<T>();
        var largestListCount = lists.Max(x => x.Count);

        // Loop until we have enough items to fill the page, or we run out of items:
        while (pageItems.Count < pageSize && currItemIndex < largestListCount)
        {
            // Taking from one list at a time:
            var currList = lists[currListIndex];

            // If the list has an element at this index, get it:
            if (currItemIndex < currList.Count)
            {
                pageItems.Add(currList[currItemIndex]);                    
            }
            // else... this list has no more elements.
            // We could throw away this list, since it's pointless to iterate over it any more, but that might 
            // change the indices of other lists...  for simplicity, I'm just gonna let it be... since the above 
            // logic simply ignores an empty list.

            currListIndex++;
            if (currListIndex == lists.Count)
            {
                currListIndex = 0;
                currItemIndex++;
            }
        }

        return pageItems;
    }

Here's some test code, using three lists. I can grab 6 items off of page 1,000,000 in just a few milliseconds :)

        var list1 = Enumerable.Range(0, 10000000).ToList();
        var list2 = Enumerable.Range(10000000, 10000000).ToList();
        var list3 = Enumerable.Range(20000000, 10000000).ToList();
        var lists = new List<List<int>> { list1, list2, list3 };

        var timer = new Stopwatch();            
        timer.Start();

        var items = GetPageItems(lists, 6, 1000000).ToList();
        var count = items.Count();

        timer.Stop();

Thank you Kennnnnnnn, this is exactly what I was looking for, super fast solution with a very descriptive explanation. Your solution is very close to @Ivan Stoev's solution but after testing both, looks like yours is about 40%-50% faster, have any idea why? — Bear.S, Oct 20 '15 at 20:26
Ok, see @Ivan Stoev improved solution - seems like the reason was the Linq group call. — Bear.S, Oct 21 '15 at 07:24

Pagination of multiple sorted lists

3 Answers3