5

I'm the webmaster for a major US university. We have a great deal of requests on our website, which I've built and been in charge of for the last 7 years or so. I've been building ever-more-complex features into our website and it's always been my practice to put as much of the programming burden on our multi-processor Microsoft SQL server as possible - using stored procedures, views, etc, and fill-in what can't be done with PHP, ASP, or Perl from the IIS web server. Both servers are very powerful and capable machines. Since I've been doing this alone for so long without anyone else to brainstorm with, I'm curious if my approach is ideal for even higher load situations we'll have in the future.

My question is: Is it better practice to place more of the load burden on the SQL server using nested SELECT statements, views, stored procedures and aggregate functions, or should I be pulling multiple simpler queries and processing through them using server-side compile-time scripts like PHP? Keep on keepin' on or come up with a better way?

I've recently become more interested in performance after I did some load traces and learned just how much I've been putting on the shoulders of the SQL server. Both the web server and SQL servers are fast and responsive throughout the day, and almost without regard for how much I put on them, but I'd like to be ready and have trained myself and upgraded my existing code optimized best practices in mind by the time it becomes important.

Thanks for your advice and input.

Brak
  • 178
  • 1
  • 9

4 Answers4

9

You put each layer in your stack to use in the domain it fits best.

There is no use in having your database server send 1000 rows and using PHP to filter them if a WHERE-clause or GROUP-clause would suffice. It's not optimal to call the database to add two integers (SELECT 5+9 works fine, but php can do it itself, and you save the roundtrip).

You will probably want to look into scalability: what parts of your application can be divided unto multiple processes? If you're still just using 2 layers (script & db), there is a lot of room for scaling there. But always start with the bottleneck first.

Some examples: host static contents on CDN, use caching for your pages, read about nginx and memcached, use nosql (mongoDB), consider sharding, consider replication.

Konerak
  • 39,272
  • 12
  • 98
  • 118
  • Thanks Konerak. Considering that I don't even know what many of those things are, I've got some real research work ahead of me. So far, I haven't really needed to be concerned with load - everything runs pretty smoothly - because of the importance I place on efficiency in my code, but getting into the habit now and learning what I _should_ be doing from here on out is a nice luxury I can afford right now. Thanks for the tips. I'll put them to good use. – Brak Mar 05 '11 at 15:12
4

My opinion is that it's generally (mostly) best to favor letting the web servers do the processing. Two points:

First is scalability. Once your application gets enough usage, you'll need to start worrying about load balancing. And it's a lot easier to drop in a couple of extra web servers pointing to a common database than it is to set up a distributed database cluster. So best to take as much strain away from the Database as you can and keep it on a single machine for as long as possible.

The second point i'd like to make is about optimizing the queries. This will depend a lot on the queries you are using, and the database backend. When i first started working with databases, i fell into the trap of making elaborate SQL queries with multiple JOINs that fetched exactly the data i wanted, even if it was from four or five different tables. I reasoned that "That's what the database is there for - lets get it to do the hard work"

I quickly found that these queries took way too long to execute, and often ended up blocking the database from other requests. While it may seam inefficient to split your query into multiple requests (for example in a for loop), you'll often find that executing multiple small queries with fast indexes will make your application run far more smoothly than trying to pass all the hard work to the database

Chris Baxter
  • 181
  • 5
  • Points well taken. That makes a lot of sense. Your former mind-set is almost precisely what I was was doing, and it paints a nice picture of what direction I should be going. Thank you for your input. – Brak Mar 05 '11 at 14:22
  • Really? Small, simple queries - even in a loop of somewhat small size - can be faster than a big join to get the same data? Man, I've been jumping through hoops to create joins for fear that queries in a loop are A Bad Thing (TM). Of course, I could (should!) test it all myself with some decent benchmarking, but I never undertook the comparison on the grounds that a single join query is "self-evidently" going to be faster than the alternative. But I guess not, huh? Don't want to hijack the thread/answer, but further comments on this point would be welcome. – David Weinraub Mar 05 '11 at 18:34
  • @David Weinraub: Not really for single joins (unless there's something REALLY wrong with your database design). I meant when you're plucking data from all over the place, resist the urge to write one monolithic statement to grab it all at once. For an example of best practices, check out the various development frameworks, eg, CakePHP, Ruby on Rails etc, and see how they handle fetching distantly related data. – Chris Baxter Mar 06 '11 at 09:38
  • I think my design is ok, pretty well-normalized. But I do have monster queries, pulling from all over. Armed with your input, I will review and reconsider. I see some profiling in my future. Thanks! – David Weinraub Mar 06 '11 at 12:30
0

Firstly, you might want to check if there is any load which can be removed entirely by client side caching (.js, .css, static HTML and images), and use of technologies such as AJAX to do partial updates of screens - this will remove load on both web and sql servers.

Secondly, see if there is sql load which can be reduced by web server caching - e.g. static or low refresh data - if you have a lot of 'content' pages on your systems, have a look at common CMS caching techniques which will scale to allow many more users to view the same data without rebuilding the page or hitting the database.

StuartLC
  • 104,537
  • 17
  • 209
  • 285
  • Thanks nonnb. I've actually started doing more AJAX and JSON database calls over the past year which has both improved speed and reduced burden on both of my servers, so that's very good advice. Client-side processing far less costly than server-side processing. I've been rewriting a lot of my old full-page Perl code into JavaScript, AJAX/JSON, and PHP commands which really feel snappy. Still, some of my database calls from PHP are requesting complex SQL queries, EXECs that run nested queries etc. I've been interested in caching so I'll certainly have to look more into it. – Brak Mar 05 '11 at 14:48
0

I tend to do as much as possible outside the db, viewing db calls as expensive/time-intensive.

For example, when performing a select on a user table with fields name_given and name_family, I could fatten the query to return a column called full_name built by concatenation. But that kind of thing can be easily done in a model on your server-side scripting language (PHP, Ruby, etc).

Of course, there are cases when the db is the more "natural" place to perform an operation. But, in general, I incline more towards putting the load on the web server and optimize there with many of the techniques noted in other answers.

David Weinraub
  • 14,144
  • 4
  • 42
  • 64
  • Heh, I've done exactly that in many cases on my server but only if there's a SQL reason to do so, like if I have to sort or group by some concatenated thing. One of the things like that which I do is get the `user_name_first, user_name_last, user_alias, COALESCE(user_alias, user_name_first) AS userNameFirst`. That way I get a record back that already has the user's preferred name. – Brak Mar 05 '11 at 14:53
  • If you're sorting or filtering by a concatenated field, it might be best to add an extra field, and do the concatentation when you insert the row. This way you can index the new field and drastically reduce the query cost, since it won't have to perform the concatenation on EVERY row each time you run the query. The benefit of this will depend on a few things (eg, how many rows, how often rows are updated etc), but play around with EXPLAINing the queries and see if you can save cost that way. – Chris Baxter Mar 05 '11 at 15:07