1

I have a dataset of multiple local store rankings that I'm looking to aggregate / combine into one national ranking, programmatically. I know that the local rankings are by sales volume, but I am not given the sales volume so must use the relative rankings to create as accurate a national ranking as possible.

As a short example, let's say that we have 3 local ranking lists, from best ranking (1st) to worst ranking (last), that represent different geographic boundaries that can overlap with one another.

ranking_1 = ['J','A','Z','B','C']
ranking_2 = ['A','H','K','B']
ranking_3 = ['Q','O','A','N','K']

We know that J or Q is the highest ranked store, as both are highest in ranking_1 and ranking_3, respectively, and they appear above A, which is the highest in ranking_2. We know that O is next, as it's above A in ranking_3. A comes next, and so on...

If I did this correctly on paper, the output of this short example would be:

global_ranking = [('J',1.5),('Q',1.5),('O',3),('A',4),('H',6),('N',6),('Z',6),('K',8),('B',9),('C',10)]

Note that when we don't have enough data to determine which of two stores is ranked higher, we consider it a tie (i.e. we know that one of J or Q is the highest ranked store, but don't know which is higher, so we put them both at 1.5). In the actual dataset, there are 100+ lists of 1000+ items in each.

I've had fun trying to figure out this problem and am curious if anyone has any smart approaches to it.

iOSBeginner
  • 363
  • 4
  • 17
  • 1
    what if there are cycles -- ranking_1=[J, … , A], ranking_2=[A, … , J] – xavierz Jan 28 '20 at 00:18
  • @xavierz if we assume the local rankings are accurate and correspond to a single global ranking solution, then there should be no cycles right? – iOSBeginner Jan 28 '20 at 02:12

2 Answers2

1

Modified Merge Sort algorithm will help here. The modification should take into account incomparable stores and though build groups of incomparable elements which you are willing to consider as equal (like Q and J)

Dmitry
  • 41
  • 6
0

This method seeks to analyze all of the stores at the front of the rankings. If they are not located in a lower than first position in any other ranking list, then they belong at this front level and are added to a 'level' list. Next, they are removed from the front runners and all of the list are adjusted so that there are new front runners. Repeat the process until there are no stores left.

def rank_stores(rankings):
    """
    Rank stores with rankings by volume sales with over lap between lists. 
    :param rankings: list of rankings of stores also in lists.
    :return: Ordered list with sets of items at same rankings.
    """

    rank_global = []

    # Evaluate all stores in the number one postion, if they are not below 
    # number one somewhere else, then they belong at this level. 
    # Then remove them from the front of the list, and repeat. 
    while sum([len(x) for x in rankings]) > 0:
        tops = []

        # Find out which of the number one stores are not in a lower position 
        # somewhere else.
        for rank in rankings: 
            if not rank: 
                continue
            else:
                top = rank[0]
                add = True

            for rank_test in rankings:
                if not rank_test:
                    continue
                elif not rank_test[1:]:
                    continue
                elif top in rank_test[1:]:
                    add = False
                    break
                else:
                    continue
            if add: 
                tops.append(top)

        # Now add tops to total rankings list, 
        # then go through the rankings and pop the top if in tops. 
        rank_global.append(set(tops))


        # Remove the stores that just made it to the top.
        for rank in rankings: 
            if not rank:
                continue
            elif rank[0] in tops:
                rank.pop(0)
            else:
                continue

    return rank_global

For the rankings provided:

ranking_1 = ['J','A','Z','B','C']
ranking_2 = ['A','H','K','B']
ranking_3 = ['Q','O','A','N','K']
rankings = [ranking_1, ranking_2, ranking_3]

Then calling the function:

rank_stores(rankings)

Results in:

[{'J', 'Q'}, {'O'}, {'A'}, {'H', 'N', 'Z'}, {'K'}, {'B'}, {'C'}]

In some circumstances there may not be enough information to determine definite rankings. Try this order.

['Z', 'A', 'B', 'J', 'K', 'F', 'L', 'E', 'W', 'X', 'Y', 'R', 'C']

We can derive the following rankings:

a = ['Z', 'A', 'B', 'F', 'E', 'Y']
b = ['Z', 'J', 'K', 'L', 'X', 'R']
c = ['F', 'E', 'W', 'Y', 'C']
d = ['J', 'K', 'E', 'W', 'X']
e = ['K', 'F', 'W', 'R', 'C']
f = ['X', 'Y', 'R', 'C']
g = ['Z', 'F', 'W', 'X', 'Y', 'R', 'C']
h = ['Z', 'A', 'E', 'W', 'C']
i = ['L', 'E', 'Y', 'R', 'C']
j = ['L', 'E', 'W', 'R']
k = ['Z', 'B', 'K', 'L', 'W', 'Y', 'R']
rankings = [a, b, c, d, e, f, g, h, i, j, k]

Calling the function:

rank_stores(rankings)

results in:

[{'Z'},
 {'A', 'J'},
 {'B'},
 {'K'},
 {'F', 'L'},
 {'E'},
 {'W'},
 {'X'},
 {'Y'},
 {'R'},
 {'C'}]

In this scenario there is not enough information to determine where 'J' should be relative to 'A' and 'B'. Only that it is in the range beetween 'Z' and 'K'.

When multiplied among hundreds of rankings and stores, some of the stores will not be properly ranked on an absolute volume basis.

run-out
  • 3,114
  • 1
  • 9
  • 25