3

I'm struggling with a problem I hope some of you might be able to help me with.

I got the following table structure for a table which contains lots of actions. I then would like to make a php script which generates some statistics.

+---------------+-------------+------+-----+-------------------+----------------+
| Field         | Type        | Null | Key | Default           | Extra          |
+---------------+-------------+------+-----+-------------------+----------------+
| id            | int(11)     | NO   | PRI | NULL              | auto_increment |
| MemberID      | int(11)     | NO   |     | NULL              |                |
| MemberNumber  | int(11)     | NO   |     | NULL              |                |
| Status        | varchar(20) | NO   |     |                   |                |
| ClubID        | int(11)     | NO   |     | NULL              |                |
| Clubtype      | varchar(5)  | NO   |     | NULL              |                |
| Time          | timestamp   | NO   |     | CURRENT_TIMESTAMP |                |
| Gender        | varchar(10) | NO   |     |                   |                |
+---------------+-------------+------+-----+-------------------+----------------+

I'm wondering if it's best to let php or mysql manipulate the data. I would like to get some stats based on Time e.g year,month,week and some stats based on Clubtype e.g FK,UK and finally some stats based on Gender e.g Male,Woman. So is it best to work with count() in the mysql queries or is it better to get all the data and the let php generate the stats.

I hope this makes sense. Thx for the help :)

EDIT: To explain my plan further. I would like to generate for every year grouped by month and for each month I want to get the

count(Status) total

count(Status) where Clubtype = $value

count(Status) where Gender = $value

furthermore for each month i want to get the stats grouped by week and finally I want to get the stats for each week grouped by day. These stats makes use of same count as above.

I hope this gives an better idea of what i want to do.

nickifrandsen
  • 423
  • 1
  • 7
  • 14
  • 1
    You should absolutely do this at the database level. The SQL queries are straightforward (GROUP BY MONTH(Time), GROUP BY YEAR(Time), GROUP BY Clubtype, etc). The database is built for these types of calculations. Trnasferring all the rows to PHP to do the calculations there is much slower and wastes a ton of memory. If your table is large, you might not even have enough free memory to make a second copy in RAM to send to PHP for processing. Do not take Shaun's advice. – Dan Grossman Jan 24 '11 at 21:54
  • My advice has changed since the question was clarified to indicate that the goal was to group and count records rather than take a set of records and attempt to perform statistical analysis on the results at the database layer. For simple grouping and counting, the database is the proper place. – Shaun Jan 25 '11 at 03:45

1 Answers1

5

Use MySQL. It will be able to calculate and organize things much faster due to indexing. You will have to make heavy use of GROUP BY and the math functions.

dqhendricks
  • 19,030
  • 11
  • 50
  • 83
  • While I agree that he should certainly isolate and group his dataset as much as possible in the database layer by utilizing indexes, I think the actual statistics calculation should happen on the application layer to avoid needless tying up of database resources. In addition, if there is need to perform grouping or ordering in such a way that the alternative to doing it in the application is generating temporary tables on the database server, it actually ends up being slower to do it at the database level. – Shaun Jan 24 '11 at 21:22
  • Shaun, you're giving very, very poor advice. This is what the database is for, and the queries are going to use much less resources than you would waste in your application. The calculations have to happen somewhere, do it where they cost the least. It will absolutely not be slower doing it at the database level. I run one of the top 10 website analytics services in the world; my site runs over 1000 queries per second to produce reports and record new data; what are you basing your recommendations on? – Dan Grossman Jan 24 '11 at 21:52
  • I strongly prefer MySql because of the reasons you mention above. But any of you got some good advices for the scripts methodology. I'm currently thinking of making separate functions for `year`, `month`, `week` for each of these I'm thinking of making 3 functions to get `overall`, by `type` and by `gender`. – nickifrandsen Jan 24 '11 at 23:56
  • @nickifrandsen make one function/method with two arguments ($timespan, $grouping) – dqhendricks Jan 25 '11 at 00:04
  • Given that the OP has edited his question to clarify that all he wants to do is group by date and count, then I agree with the database assessment. As the question was worded originally, I got the impression he wanted to do statistical analysis on the data that was stored within the database rather than just simple counts. – Shaun Jan 25 '11 at 03:34
  • @Shaun I will be doing that too. But most of the calculations can be performed based on simple counts. – nickifrandsen Jan 25 '11 at 10:09