4

I was about to ask the MySql list this and remembered about SO.

Running MySql 5.0.85, I need to be as efficient as possible about a few queries. If I could get a little review, I would appreciate it.

I collect data in the millions, and need the top 50 grouped by one field, with a percentage of how much those top 50 occupy.

Here is what I have come up with... 1) I have a feeling I can be more efficient, perhaps with a join 2) How can I get the percentage to be of precision in the hundredths, so * 100.00 ie: .07 becomes 7.00, getting SQL errors if I (percentage * 100)

SELECT user_agent_parsed, user_agent_original, COUNT( user_agent_parsed ) AS thecount, 
    COUNT( * ) / ( SELECT COUNT( * ) FROM agents ) AS percentage
FROM agents
GROUP BY user_agent_parsed
ORDER BY thecount DESC LIMIT 50;

Second issue, once a day I need to archive the result of the above. Any suggestions on how to best to do that? I can schedule with cron, or in my case, launchd, unless someone has a better suggestion.

Would you think that a simple 'SELECT (the above) INTO foo' would suffice?

Stephan Muller
  • 27,018
  • 16
  • 85
  • 126
user170579
  • 8,180
  • 6
  • 24
  • 21

2 Answers2

9

First Issue:

select count(*) from agents into @AgentCount;

SELECT user_agent_parsed
     , user_agent_original
     , COUNT( user_agent_parsed )  AS thecount
     , COUNT( * ) / ( @AgentCount) AS percentage
 FROM agents
GROUP BY user_agent_parsed
ORDER BY thecount DESC LIMIT 50;
lexu
  • 8,766
  • 5
  • 45
  • 63
  • How is that a higher performer? Still two queries, you may even slow it down as you are now literally storing a variable. milliseconds sure, but can you elaborate? – user170579 Oct 16 '09 at 07:29
  • Your nested query is potentially run once per grouped element. Mine runs once. Granted, this might be caught by the optimizer.. – lexu Oct 16 '09 at 07:38
  • 2
    No need for the dual selects, the MySql optimizer, at least in 5.x takes care of it. – user170579 Oct 20 '09 at 07:31
0

I quite don't understand your question fully so I'll just answer first your question on how to get the percentage. And I'll use your present query.

 SELECT user_agent_parsed, user_agent_original, COUNT( user_agent_parsed ) AS thecount, 
    ((COUNT( * ) / ( SELECT COUNT( * ) FROM agents)) * 100 ) AS percentage
FROM agents
GROUP BY user_agent_parsed
ORDER BY thecount DESC LIMIT 50;

In order for me to help you further, I think I need you to elaborate it further ;-)

junmats
  • 1,894
  • 2
  • 23
  • 36
  • Misplaced paren, Thanks!. The second issue is that I will take the result of the above query, and want to save that results state in time. I am storing hits to a user agent log, so I can find that Safari is 100 uses a day, IE is 65 uses a day, etc (simplified). This of course changes from day to day and I want to chart the growth/decline over a year. I need to store the result of the above query, for long term stats. I am considering selecting the result into a new table, unless that is a bad idea and there is a more elegant one, – user170579 Oct 16 '09 at 20:23