0

I'm wondering if anyone knows of a simple algorithm for performing a shuffle of a list that permits a weight bias so that each item in the list works its way toward the top of the list at the same time.

I am working on a site with business listings in paginated directories and the listings need to display fairly so one business cannot always be above/below another listing. A pure shuffle of the directories is not really sufficient as the random nature of this may result in any given business randomly shuffling to a similar location within the list for an extended period of time, so I'd like to provide some weighting so that each listing is slowly nudged up the list so that they get a reasonably equal opportunity to display on the first page of the directory over time.

EDIT:

With thanks from Kevin - I'm attempting to formalise these rules:

1) for n listings each listing must display in position one once in n "quasi shuffles")

2) (fuzzy) the average (?) position of a listing should increase over time until it reaches position 1

3) for any two businesses (A and B) over n iterations of the shuffle A must not be above B more than 50% of the time?

I should also add that I work for a business that has an extremely complex and convoluted "Shuffler" that is necessary to pacify a large number of paying clients who insist on being fairly distributed across their business's respective categories within our directories. Complaints from customer is a "real" problem, given that users typically pick items from the first couple of paginated pages it is not fair to order clients by alphabetical order (by default), and given that users read from top to bottom, it's not fair that one business is always above another.

I'm interested to know whether anyone has a tidy solution to this problem that they may have implemented previously.

EDIT:

One thought I've had, given these items are stored in the database, I could have a column which is the sum of each listings position over time, which I could use for ordering (descending), when an item reaches the first position in the list I could then set it to 0 which would mean that every item in the list would eventually make it to the top of the list. The problem is that for a large number of listings, over time, this number could become rather large...

EDIT:

I don't want to slam the database and I need consistency whilst a user is browsing therefore I will only be performing the "pseudo shuffle" on a nightly basis (once a day) not on every display of the directories

Rob
  • 10,004
  • 5
  • 61
  • 91
  • Could you give a formal definition of your problem? What exactly is the weight supposed to do? – Fred Foo Sep 17 '12 at 11:41
  • @larsmans slowly push the items up the list - "I'd like to provide some weighting so that each listing is slowly nudged up the list so that they get a reasonably equal opportunity to display on the first page of the directory over time." – Rob Sep 17 '12 at 11:47
  • 1
    How did it come to your attention that purely random shuffling was resulting in unfair results for your site? Do you have any measurable criteria we can use to evaluate our solutions? Or is it a situation of your boss saying "Customer X complained that his listing isn't high enough, fix it"? – Kevin Sep 17 '12 at 11:56
  • @Rob: That's not a formal problem statement. – Fred Foo Sep 17 '12 at 11:59
  • @Kevin - the latter, but there is a shared sentiment that ideally clients should see a steady shift to the front of the queue over time rather than randomly jumping all over the place... – Rob Sep 17 '12 at 12:17
  • @larsmans sorry mate, I am not sure what that is, if I did maybe I wouldn't be asking this question... – Rob Sep 17 '12 at 12:17
  • 2
    I think larsmans wants quantifiable rules that can be used to accept or reject any particular algorithm as valid. For example, rules like these: 1) for any two entries, one of the entries must not continuously appear above the other for more than `X` shuffles. 2) An entry must move at least `Y` rows total over the course of `Z` shuffles. 3) Over the course of `A` shuffles, every entry is guaranteed to appear among the first `B` rows (the front page). – Kevin Sep 17 '12 at 12:44
  • @Kevin - thanks for clarifying, really I guess that the formal rules would be 1) for n listings each listing must display in position one once in n "quasi shuffles") 2) (fuzzy) the average (?) position of a listing should increase over time until it reaches position 1 – Rob Sep 17 '12 at 12:52
  • How about your "company A must not always be above company B" rule? If the formal definition is as I've written it one sentence back, then you would be fine with putting company A above B 99% of the time. Do you have a more stringent requirement? – Kevin Sep 17 '12 at 12:58
  • @Kevin - Sorry yes your right, but is this feasible to track? if you said for any two businesses (A and B) over n iterations of the shuffle A must not be above B more than 50% of the time? – Rob Sep 17 '12 at 13:06
  • If you enforce exactly 50%, then the one and only legal shuffle will be to reverse the whole list. If you relax it a little bit (say, to 60%), then the answer is "it depends on how many companies you have". If you have X companies and are tracking them over N shuffles, it will take about N*X*X bits of memory to store their relationships. – Kevin Sep 17 '12 at 13:14
  • @Kevin - lol you are far better at working this stuff out than I am, but I really appreciate your input, I had a feeling 50% would be a problem, ideally I guess I was hoping that the "shuffle" part would take care of this... – Rob Sep 17 '12 at 13:17
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/16780/discussion-between-rob-and-kevin) – Rob Sep 17 '12 at 13:32

2 Answers2

0

The simplest answer that comes to me is to use a Least-Recently-Used (LRU) approach.

Update a timestamp for each element that gets displayed in the top page and render all the elements sorted from "least recently used/displayed" first to "most recently used/displayed" last.

That should do the trick, and involves updating the timestamp of just the elements displayed in the top page.

As new elements are added, and old elements are removed from the list this should keep circulating items gracefully.

You could fine-tune this by allowing items to stick to the front page for a few iterations before being sent back to the bottom of the pile. This would depend on the number of items in your database, rate of addition of new elements, rate of removal of old elements.

I hope this helps, Laurent.

Laurent
  • 1,048
  • 7
  • 19
  • sorry I'm going to add an edit - I don't want to slam the database and I need consistency whilst a user is browsing therefore I will only be performing the "pseudo shuffle" on a nightly basis (once a day) not on every display of the directories – Rob Sep 17 '12 at 12:25
  • I'm not sure to understand the meaning of your "slam the database" comment. This approach only involves updating the top elements, not the whole database. – Laurent Sep 17 '12 at 12:28
  • This solution could still be used even if you only perform an update once per day. Just change the "update timestamp when I'm rendered on the front page" to "update timestamp when I'm moved into the top X rows of the database". – Kevin Sep 17 '12 at 12:32
  • But this will not overcome the situation of business A always being above business B - whatever the default display order is, is the way they will always be display, this is the main problem and a real issue for clients. Paying clients don't do Alphabetical Order – Rob Sep 17 '12 at 12:36
  • You would order them by timestamp, not by alphabetical order. This is just a round-robin algorithm to ensure that all records are rotated on the front page. Random shuffling could be used to mix the records when updating the timestamp... – Laurent Sep 17 '12 at 13:06
  • @Laurent - Unfortunately that's the problem, random shuffling isn't ideal because there is no guarantee that a single listing wont be randomly shuffled to the last 50% of records infinitely. I'm basically trying to achieve a round-robin with a weighted shuffle to ensure that some of the randomness is taken out of the equation using some form of weighting over time (see my edited rules above) – Rob Sep 17 '12 at 13:23
  • In my last comment I meant to say: "Random shuffling could be used to mix the records when updating the timestamp *only to the records being updated*" – Laurent Sep 17 '12 at 13:39
  • So this would basically end up with you shuffling the results within a given page set i.e. if there are 10 items per page then these items will be shuffled amongst themselves but would all drop off and go got the back of the queue as part of the round robin? – Rob Sep 17 '12 at 14:00
  • Yes. But depending on the number of items being rotated, and time you want the items to take at most to rotate from last page to front page, you would obviously have to apply this process of "drop off and rotate to the back of the queue" on more than just the first page. Let's say you want to guarantee a full rotation in 5 days, then I would apply this process on the first 20% of the items. – Laurent Sep 17 '12 at 14:14
0

for a database that has X companies, create an X by X grid, and populate each cell with a company name. Any given company name should appear exactly once in each row and column. For example, for a database of ten companies, each with a one character name, one such grid would look like this:

ABCDEFGHIJ
BCDEFGHIJA
CDEFGHIJAB
DEFGHIJABC
EFGHIJABCD
FGHIJABCDE
GHIJABCDEF
HIJABCDEFG
IJABCDEFGH
JABCDEFGHI

The company in the xth row and yth column will appear x units from the top of the list on the yth day. In other words, each day you refer to a different row for the ordering of your company names. This scheme satisfies two of your criteria: that each element must be in the #1 slot at least once every X days, and that any particular element should not stagnate in the same position for a long time. But there is still the problem that company B always appears below company A, so some additional work is needed.

Choose two columns at random and swap them around. Repeat this process until the columns are sufficiently randomized (See Fisher-Yates Shuffle for a linear-time way to do this). One such result might look like this:

HIDEJBGCAF
IJEFACHDBG
JAFGBDIECH
ABGHCEJFDI
BCHIDFAGEJ
CDIJEGBHFA
DEJAFHCIGB
EFABGIDJHC
FGBCHJEAID
GHCDIAFBJE

Now, on average A will be in front of B 50% of the time. The actual percentage will vary, but it will fall on a bell curve centered around 50%, and only rarely will it reach a very uneven proportion.

Company B might complain that it always appears in the #1 slot exactly one day after company A appears at #1. If this is a problem, then perform a shuffle on rows too:

GCDHBIFEAJ
HDEICJGFBA
BHICGDAJFE
JFGAEBIHDC
IEFJDAHGCB
AGHBFCJIED
CIJDHEBAGF
EABFJGDCIH
FBCGAHEDJI
DJAEIFCBHG

Now you have an ordering scheme with the following properties:

Pros

  • over the course of an X-day-long cycle, all X companies appear in the #1 slot.
  • over the course of an X-day-long cycle, no company will ever stagnate in the same slot. Once it occupies slot K, it will not return to that slot for the rest of the cycle. (In the worst case, it still might "hover" around the same area for a while, but eventually it will go everywhere on the list)
  • no one company will appear over another much more than 50% of the time. The more companies you have, the closer to 50% it approaches.
  • which company will appear in the #1 slot is unpredictable, so no one can legitimately claim a bias based on when a company is spotlighted.

Cons

  • for a database of N companies, generating a grid takes O(N^2) time and memory. You only need to generate one every N days, and you can do it ahead of time, so you can amortize the cost down to O(N) time.
  • Companies do not "bubble up" over time. I believe this constraint conflicts with the "no company should appear above another too much" constraint; if all companies bubble upwards with approximately equal velocity, then the ones that started higher will usually be above the ones that started lower. The method I've given is the result of discarding one requirement in order to satisfy another mutually exclusive requirement.
  • for any X-day-long time period, there is a 1/N chance that a company will appear in the #1 slot two days in a row. This occurs, for example, when company A is in the #1 slot on the last day of the cycle, and when you generate a new grid, company A is in the #1 slot on the first day of the cycle. If this is undesirable, you can perform another shuffle until A isn't in the first slot.
Kevin
  • 74,910
  • 12
  • 133
  • 166
  • thanks for this comprehensive answer, just wondering in relation to this statement - "Choose two columns at random and swap them around. Repeat this process until the columns are sufficiently randomized" couldn't this result in a company not being in the top #1 slot at all, would it be better to shuffle anything but the first column? – Rob Sep 18 '12 at 11:38
  • Nope, every company always makes it into the top slot. This is because every column contains every company exactly once. There's no way to swap columns and end up with, say, two As or zero As. This is true even if you swap columns _and_ swap rows. That's what distinguishes this answer from just randomizing the list every day - it ensures a degree of fairness over X days. – Kevin Sep 18 '12 at 11:51