1

Background Story
I have an MYSQL table that stores application data for an annual race. Among other things, the table holds the Racer's ID and the year for that race application.

This year, we want to make the random selection process for applications weighted, so that people who've applied for X amount of continuous years have a greater chance of being selected.

In my PHP script, I have the MYSQL object returning 1 record for every year they've applied, select one at random and unassign it from the DB result array, assign it to the winners array, and ignore all other winners with the same ID for this year.

Actual Problem
The problem with this method is that my current MYSQL query returns all records for all people who've applied for this year and a past race. Is there any way (preferably using MYSQL) that I can have it return only continuous applications from 2014 and before?

An example result set from this SQL FIddle would be "1, 1, 1, 2, 2, 2, 2, 3".

My current query is:

SELECT a.racer_id FROM applications a

WHERE a.racer_id IN
    (
        SELECT ab.racer_id FROM applications ab

        WHERE ab.racer_id = a.racer_id AND ab.race_year=2014
    )

ORDER BY RAND();

Which doesn't work because it grabs all past years, not just the continuous records from 2014 and before.

ITS Alaska
  • 699
  • 1
  • 7
  • 19
  • Why not add a column that is the number of continuous years that they have applied for each application? Then you only have to get one row for each applicant. – Schleis Aug 07 '13 at 21:21
  • Honestly, that thought hadn't occurred to me. It may be the best solution for this particular project because there are additional factors for this specific race that determine application validity, but I'm also wondering if it's possible for any future projects. – ITS Alaska Aug 07 '13 at 21:24
  • id 2 and 4 shouldn't be first ? and You have mistake, should be `ab.race_year=2014` – jaczes Aug 07 '13 at 21:26
  • 1
    This question looks related to what you are looking for: http://stackoverflow.com/questions/3139323/finding-continuous-ranges-in-a-set-of-numbers – Schleis Aug 07 '13 at 21:27
  • 4 didn't apply for the 2014 race, and 2 shouldn't necessarily be first because the order is random; 2 should have the most records returned so they have a higher chance of being first. And I fixed the `race_year` typo. @jaczes – ITS Alaska Aug 07 '13 at 21:27
  • Thanks for the link. I had to think about what specific numbers I would need to replicate that concept in this system, and the numbers I'll need are the number of previous applications and race year. I'll play with MYSQL a little and see if I can have it pull the number of previous applications dynamically so I won't need to make a new table column. @Schleis – ITS Alaska Aug 07 '13 at 21:38

1 Answers1

0

Another way to phrase this question is to find the most recent year where a racer did not compete. Then, the current year minus that year provides the number of continuous years.

You can get the most recent year by doing a cross join between the years and racers, and the left joining to the applications. Where there is no match is a year where that racer did not compete.

You do have to do some additional bookkeeping to handle racers who competed in all years. The final query is:

select racers.racer_id,
       maxyear - max(case when a.race_year is null then years.race_year
                          else const.minyear - 1
                     end)
from (select distinct race_year
      from applications
     ) years cross join
     (select distinct racer_id
      from applications
     ) racers cross join
     (select max(race_year) as maxyear, min(race_year) as minyear
      from applications
     ) const left outer join 
     applications a
     on a.race_year = years.race_year and
        a.racer_id = racers.racer_id
group by racers.racer_id;

I'm not sure how this relates to the data in SQL Fiddle. You have four racers there, but you have 8 suggested values in the question.

EDIT:

Now I get it. That's pretty reasonable. The following query basically extends the idea above by joining back to application to get any years after the maximum year with no application. The subquery is a simplified from the above:

select a.racer_id
from (select racers.racer_id,
             max(case when a.race_year is null then years.race_year
                 end) as maxyear
      from (select distinct race_year
            from applications
           ) years cross join
           (select distinct racer_id
            from applications
           ) racers left outer join 
           applications a
           on a.race_year = years.race_year and
              a.racer_id = racers.racer_id
      group by racers.racer_id
     ) ry join
     applications a
     on a.racer_id = ry.racer_id and
        (a.race_year > ry.maxyear or ry.maxyear is null);
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1 record needs to be returned for every application they've filed in a row before and including this year. For example, if they've applied for 3 consecutive years before this year and then again this year, they should have their ID returned 4 times. I know it's not pretty or efficient, but it's the only way I can think of to accomplish truly random weighted selection. – ITS Alaska Aug 07 '13 at 23:04