2

I'm trying to understand which of the following is a better option:

  1. Data calculation using Python from the output of a MySQL query.
  2. Perform the calculations in the query itself.

For example, the query returns 20 rows with 10 columns. In Python, I compute the difference or division of some of the columns.

Is it a better thing to do this in the query or in Python ?

Ravi
  • 2,472
  • 3
  • 20
  • 26

2 Answers2

1

It is probably a matter of taste but...

... to give you an exact opposite answer as the one by Alma Do Mundo, for (not so) simple calculation made on the SELECT ... clause, I generally push toward using the DB "as a calculator".

Calculations (in the SELECT ... clause) are performed as the last step while executing the query. Only the relevant data are used at this point. All the "big job" has already been done (processing JOIN, where clauses, aggregates, sort).

At this point, the extra load of performing some arithmetic operations on the data is really small. And that will reduce the network traffic between your application and the DB server.

It is probably a matter of taste thought...

Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125
  • How does it reduce the network traffic ? The data transferred will actually increase from the DB to the application server because the calculations performed in the query means extra number of columns and hence extra data. – Ravi Aug 16 '13 at 10:08
  • @Ravi Maybe I missed the point, but, say, to calculate the sum of two column, I do `SELECT a+b` instead of `SELECT a,b`, performing the addition on the host language, I reduced the network traffic by almost 50%. – Sylvain Leroux Aug 16 '13 at 10:10
  • Agreed. My answer based on common case (such nasty things as `ORDER BY RAND()` - for example) – Alma Do Aug 16 '13 at 10:14
1

If you are doing basic arithmetic operation on calculations in a row, then do it in SQL. This gives you the option of encapsulating the results in a view or stored procedure. In many databases, it also gives the possibility of parallel execution of the statements (although performance is not an issue with so few rows of data).

If you are doing operations between rows in MySQL (such as getting the max for the column), then the balance is more even. Most databases support simple functions to these calculations, but MySQL does not. The added complexity to the query gives some weight to doing these calculations on the client-side.

In my opinion, the most important consideration is maintainability of the code. By using a database, you are necessary incorporating business rules in the database itself (what entities are related to which other entities, for instance). A major problem with maintaining code is having business logic spread through various systems. I much prefer to have an approach where such logic is as condensed as possible, creating very clear APIs between different layers.

For such an approach, "read" access into the database would be through views. The logic that you are talking about would go into the views and be available to any user of the database -- ensuring consistency across different functions using the database. "write" access would be through stored procedures, ensuring that business rules are checked consistently and that operations are logged appropriately.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I'm not sure if the benefit of encapsulating the results in a view goes with my case. The tables are frequently updated and from what I've understood so far views are not particularly useful in such a scenario. It might be my misunderstanding of your statement but what do you mean by MySQL does not support simple functions for MAX etc... MySQL does support these operations right ? – Ravi Aug 16 '13 at 11:08
  • @Ravi Even in MySQL support *aggregate functions*, it lacks [window functions](http://www.postgresql.org/docs/9.1/static/tutorial-window.html). You could "easily" emulate them, but at increased complexity and execution time cost. – Sylvain Leroux Aug 16 '13 at 13:07