3

I have a set of MySQL tables containing information from a football management game.

The tables are:

  • Players - playerID (PK), playerName
  • Matches - matchID (PK), matchSeason, matchRound, matchType
  • PlayersMatch - playerID, matchID (comp PK), matchRating, playerForm, playerAge, position (can be null)

Data stored in these tables are related to the performance of a player. A Player plays in a match and has a rated performance (matchRating). There is a record in PlayersMatch for each match a player participates in, recording the players current form, match performance, their age at the time of the match (for historical purposes) and the position they played in.

Now, currently, I'm using the following query to list the top 10 players from the whole season (best performance in the season overall as opposed to best performance per round):

SELECT playerID, matchID, playerForm, playerAge, MAX(matchRating)
FROM PlayersMatch
INNER JOIN Matches ON PlayersMatch.matchID = Matches.matchID
WHERE Matches.matchSeason = 35
AND Matches.matchType = 'L'
AND PlayersMatch.position IS NOT NULL
GROUP BY PlayersMatch.playerID
ORDER BY MAX(matchRating) DESC, playerForm ASC

The problem I'm getting is that while I'm getting the right playerID and player matchRating, I'm getting the wrong matchID, form, age and other info (ie they're from other records).

I tried adding the matchID to the group by, and while I got the correct information, I had duplicates as it produced duplicate records for the players (because playerID and matchID make up the PK in PlayersMatch).

Your assistance in this is greatly appreciated.

Edit: After some further reading, I must be writing the SQL incorrectly and that a group by is only going to return the correct information if I just have playerID and max(matchRating) - at the very least to be ANSI SQL correct.

In that case, how do I get the corresponding match details for that performance if I'm using a max/group by?

Edit 2: Looks like I've got a working query:

SELECT * FROM PlayersMatch
INNER JOIN 
   (SELECT playerID, MAX(matchRating)
    FROM PlayersMatch p2
    JOIN Matches
      ON p2.matchID = Matches.matchID
    WHERE matchSeason = 35
    AND matchType = 'L'
    AND p2.position IS NOT NULL
    GROUP BY p2.playerID) AS p1
  ON PlayersMatch.playerID = p1.playerID
  AND PlayersMatch.matchRating = p1.matchRating
JOIN Matches m2
  ON PlayersMatch.matchID = m2.matchID
WHERE m2.matchSeason = 35
AND m2.matchType = 'L'
AND PlayersMatch.position IS NOT NULL
ORDER BY matchRating DESC

The only problem now is that it takes 21 seconds to run. Does this query look correct?

Cyntech
  • 5,362
  • 6
  • 33
  • 47
  • I don't agree using MAX(matchRating) for overall top-performer. Should not it be using AVG ? An example, player A played two matches, 1st rated 10, 2nd rated 2 (Avg 6), Player B played two matches, rated 8,8 (Avg 8). But your calculation will make the player A to be the top performer. – ajreal Dec 03 '11 at 13:28
  • @ajreal, thanks for the suggestion, but an average rating is not what I'm trying to display. I might add one for average, but at this point I just need max. Still, doesn't solve my problem though. :P – Cyntech Dec 03 '11 at 13:47
  • Not entirely: you are joining on playerId and matchRating, but that isn't guaranteed to be unique (should be matchId). Have you created the proper indices (matchRating, possibly matchSeason, matchType and position.) Use `explain` to see why it is taking so long. – Inca Dec 04 '11 at 11:18
  • @Inca, I realise that matchRating and playerID isn't guaranteed to be unique, but I get an 'Unknown column: p1.matchID' as it's not in the group by and I can't add it to the group by because then I get duplicate rows for the player. The explain plan shows most rows on the derived table (assuming the group by sub query). I have indices on matchRating, playerID and positionType individually (and obviously the composite PK of playerID and matchID). Should I have an index with all those columns in the one index? – Cyntech Dec 04 '11 at 12:15

2 Answers2

1

Add a second index on PlayersMatch based on MatchID only for your preliminary qualification join to Matches. Add an index to your Matches table on matchSeason and type.

From your edit and posted data samples, I think this resolves to get the first "match" that qualifies the spanned multiple instances under a same "Rank". So, again, the inner-most gets the best MatchRating as your "MAX()" apparently is looking for the HIGHEST Rating. From that, it will immediately re-join to player matches and get the FIRST Match ID for that person with the same Rating. Finally, to close it out, we can directly join to the person for name info, and to the match based on the first match ID found, so no duplicates should be returned... That final result gets sorted per the match ranking..

SELECT STRAIGHT_JOIN
      Players.PlayerName,
      M2.*,
      PM.MatchRating,
      PM.PlayerForm,
      PM.PlayerAge,
      PM.Position
   FROM 
      ( select PreMatch.PlayerID,
               PreMatch.MaxMatch,
               MIN( P3.MatchID ) as FirstMatch
           FROM
               ( SELECT 
                    p2.playerID, 
                    MAX(p2.matchRating) MaxMatch
                 FROM 
                    Matches
                       JOIN PlayersMatch P2
                          ON Matches.MatchID = p2.matchID
                         AND P2.Position is not null
                 WHERE 
                        Matches.MatchSeason = 35
                    AND Matches.MatchType = 'L'
                 GROUP BY
                    p2.playerID ) PreMatch

               JOIN PlayersMatch P3
                   ON PreMatch.PlayerID = P3.PlayerID
                  AND PreMatch.MaxMatch = P3.MatchRating
                  AND P3.Position is not null

                  JOIN Matches M2
                     on P3.MatchID = M2.MatchID
                    AND M2.MatchSeason = 35
                    AND M2.MatchType = 'L' 
          GROUP BY
             PreMatch.PlayerID,
             PreMatch.MaxMatch
      ) AS p1

      JOIN Players
         on P1.PlayerID = Players.PlayerID

      JOIN PlayersMatch PM
          on p1.FirstMatch = PM.MatchID 

   ORDER BY 
      p1.MaxMatch DESC
DRapp
  • 47,638
  • 12
  • 72
  • 142
  • Thanks for the answer, it looks really good but I'm getting all player match records for each player. Can a limit be put on a join to restrict it to just the best matchRating for each player? – Cyntech Dec 05 '11 at 13:32
  • @Cyntech, Can you do a dump of the data for 1 or 2 players from each respective table this is happening for as I can't see how. The inner most query gets ONE record per player and their maximum MatchRating ONLY for season 35, type L. Then joins to player on their ID (which should only be 1), then re-joins back to player match on their ID and MaxMatch to the rating for the same season 35, type L. So, unless a person has multiple entries for the same match season, type and matchrating, it shouldn't. – DRapp Dec 05 '11 at 14:26
  • DRapp, that's exactly the reason. If a player has the same match rating (a common occurrance) then you get a row each time they get that rating. So we're back to the original problem where you can't add the matchID to the join because it would affect the group by. Example data can be found here: http://pastebin.com/2ijwPwCc – Cyntech Dec 07 '11 at 13:26
  • @cyntech, updated answer per your data samples. This version should get what you are looking for -- ONE instance per person. – DRapp Dec 07 '11 at 15:04
  • DRapp, the sub query obtaining the minimum matchID isn't getting it for each player. It's returning the minimum matchID for all the records in the group by sub query (e.g. 1 record). Thus when the rest of the query uses that part of it, I get 180,000 records of the same player. I'm starting the think that this data structure isn't going to allow this sort of query and that I should split it into two queries: the group by and then the min matchID. Your thoughts? – Cyntech Dec 12 '11 at 01:55
  • @Cyntech, sorry, I didn't see sooner, but I think the adjustment just made might help. I didn't add the inner "GROUP BY" by player / match... Try it – DRapp Dec 12 '11 at 03:22
  • DRapp, no go, duplicates still appearing. Thank you for your time, patience and effort in this, but time constraints mean that I cannot spend more time on this. I'll mark your answer correct as a thank you for the effort you've contributed. – Cyntech Dec 12 '11 at 22:37
0

An aggregate only works on the actual column it is applied. It is not record-based. It does select the max-value for the rating, but it does not determine how the other columns are aggregated.

So when you have records:

player 1 | match 1 | 10 
player 1 | match 2 | 5

and you group them by player, it needs to pick only one value for the 'match'-field, but which one is not defined: it doesn't depend on aggregates in other columns.

For what you want to do, you need a subquery:

SELECT p1.playerID, p1.matchID, p1.playerAge, MAX(p1.matchRating)
FROM PlayersMatch P1
JOIN PlayersMatch p2 on p1.id = 
   (SELECT id 
    FROM PlayerMatch p2 
    WHERE p2.playerId = p1.playerId 
    ORDER BY MAX(p2.matchRating) DESC 
    LIMIT 1)
GROUP BY playerID

Note that I've introduced a generated primary key for the playersmatch-table (as I'm not sure on the join-syntax for composite keys and general favor single-field artificial keys.) You can still have a unique constraint on (playerID, matchId).

Inca
  • 1,891
  • 14
  • 15
  • I tried your suggestion, and substituted a generated key (playerID + matchID) to achieve it. Unfortunately, it takes forever to run - i.e. more than 20 minutes when I killed it. – Cyntech Dec 04 '11 at 04:17