25

I'm writing a business plan and I have to simulate the cost when my website will be reach from 500,000 unique visitors.

  • visitors: 500,000
  • pageviews: 1,500,000
  • spider pageviews: 500,000
  • total pageviews: 2,000,000

Each page does 50 queries +-

  • queries per day: 100 Million
  • per hour: 4 Million
  • per minute: 70,000
  • per second: 1,200
  • peak: 3,000

Doing this calculation I need to perform 3,000 queries second... what kind of server can handle it?

The problem is: actually my site is doing 2,000 visits day, and having -+ 150/200 queries / second... starting from this point I'll expect 50,000 queries / second.

How many servers I need in cluster or replication do manage this job?

Teymour
  • 103
  • 3
  • 5
    What sort of site does 8k+ queries a visit? – Ignacio Vazquez-Abrams Jul 28 '10 at 19:23
  • 5
    You need a system design review right away. – Chopper3 Jul 28 '10 at 21:14
  • 1
    Nowhere near enough information, because you've told us nothing about what really matters - the queries themselves. Nor have to told us about the machine you're running. Is this a 486? The latest and greatest super computer or something in between? All those numbers you've listed are irrelevant to the question. Please provide RELEVANT information. – John Gardeniers Jul 28 '10 at 21:38
  • >What sort of site does 8k+ queries a visit? i receive 2000 unique visitors but each visitors opens many pages, + i have a lot of spiders inside. 2000 unique users are generating 6000 unique ips opening more than 120.000 pages opened daily. thanks –  Jul 29 '10 at 12:52

7 Answers7

31

I used to work for an e-commerce company with a website that had several million page hits per day. We had a single DELL PE 1750 with 2 single core CPUs and 2GB of RAM, database size approx. 4GB. At peak times this server handled up to 50k+ queries per second.

Having said this: the database was well structured, all queries were finely tuned (we had weekly sessions analysing the slow query logs and fixing up queries and indexes) and the server setup was also fine-tuned. Caching is definitely a good idea, but MySQL does that anyway, you just have to analyse the performance and then fine tune how your memory is used (query cache vs other options).

From that experience I can tell you that the highest impact is caused by missing indexes, wrong indexes and bad database design (e.g. long string fields as primary keys and similar nonsense).

wolfgangsz
  • 8,847
  • 3
  • 30
  • 34
9

It all depends on how complex the query is, and how much memory the servers have, and how fast the disks are.

If the queries are very simple, or very well tuned then a single large database server can handle that. If however the queries are very complex (or simple but poorly tuned) then you'll need several servers.

mrdenny
  • 27,174
  • 4
  • 41
  • 69
  • Or some serious schema changes and reindexing... – Massimo Jul 28 '10 at 19:22
  • 3
    Tuning is ALWAYS preferred over adding more hardware. Adding more hardware just masks the problem until such a time as the problem is much harder to solve. – mrdenny Jul 29 '10 at 02:16
  • Thanks for the answer, so i think that 2 servers in parallel + 1 passive for redoundance should be ok, right? i'm talking about 2x quad cores servers with 32 g of ram and quick drives. am i right? remember that i need performances! –  Jul 29 '10 at 12:20
  • 1
    everything is well tuned and indexed, i have 1 or 2 slow queries per week (and slow-query-time is just 2 seconds) anyway i'm writing a business plan, and i'd like to know what kind of server pool can manage 12,000,000 pages opened daily generating with 8000 queries / second –  Jul 29 '10 at 12:55
  • 8000 queries a second isn't all that much. A single 16 core server will probably do the trick. 64 Gigs of RAM (or more or less depending on how big the database is and how much data needs to be kept in cache at any one time) should do the trick. My DB (granted its SQL Server) is 1 TB on a 16 core 64 Gig of RAM server with 40-50k users hitting it daily up to several times per minute (each) through out the day. – mrdenny Jul 29 '10 at 21:25
  • @mrdenny The *contrary* is true. Tuning is almost never preferred over adding more hardware. Hardware is cheap. Programmers are expensive. Source [SO founder]: https://blog.codinghorror.com/hardware-is-cheap-programmers-are-expensive/ – Adam Sibik Aug 27 '18 at 12:15
  • We have a client that needed to run 1M batches per second on their SQL Server. Everything had to be tuned just right, if it wasn't that number wouldn't have been possible, even on the biggest servers available. Basically, it depends. Sometimes you should throw money at hardware, sometimes more hardware isn't a viable option and you have to actually fix things. – mrdenny Oct 25 '18 at 03:45
3

This really can't be estimated without knowing anything about the specific queries you're running, the database scheme and its size.

A simple SELECT on an indexed column is quite a different beast from a couple of JOINs based on non-indexed ones... and of course things change a lot if the involved tables contain 1K records or 1M.

Also:

  • What is your current hardware configuration?
  • How much of its power (CPU, RAM, Disk I/O) is your server using under the current load?
Massimo
  • 70,200
  • 57
  • 200
  • 323
  • actually i have a server with 2x quad core with 8 GB of ram. i'm using the full ram and 100 % of processor (it seems i can use 800%, see here: ) cpu: http://img834.imageshack.us/img834/3483/downloadv.png ram: http://img442.imageshack.us/i/download2p.png/ disk: http://img213.imageshack.us/i/download1x.png/ thanks –  Jul 29 '10 at 12:28
  • Based on those graphs, you're only using one (or at most two) of your CPU cores; so your application is definitely not CPU-bound... or it is, but it's uncapable of taking advantage of multiple CPUs. Also, all of that memory used for "cache" is not actully **needed** by anyone, it's just the O.S. taking advantage of it because "it's there". – Massimo Jul 29 '10 at 14:01
  • how can i find information on using all the cpu cores? i 'm using lamp... –  Jul 30 '10 at 11:12
  • First of all, you should check whether you're not using them because there's just not any need for them (= low load), because your operations can't be properly parallelized, or because your MySQL and/or Apache are not configured to use them. And, since those two programs usually are multithreaded by default, I'd have a look into your server load and into your SQL queries... – Massimo Jul 30 '10 at 11:23
3

As Ignacio remarked, you may want to look into caching. In the cms or perhaps even in front of the stack. 50+ queries for every (every!) page truly is a lot.

Joris
  • 5,969
  • 1
  • 16
  • 13
  • 1
    yes this is a complex website, it's a community, i can't cache anything, it's changing every second. i tried to cache pages, but the cache hitrate was nearly 0, since every time i cache a page, it can be never read again, or it can change before it's opened again. thanks –  Jul 29 '10 at 12:30
  • 4
    There are very few uncachable sites; if it only changes every second you can still cache for a whole second, like 10 pageviews ;-) Have you considered not caching pages entirely, but rather blocks or specific values etc? You could cache outside the database, on shared memory segments, filesystem, memcached. Also, typically in such a situation ESI could be usefull – Joris Jul 30 '10 at 18:03
2

There are too many things that can effect your queries per second, please do not trust my data without testing yourself. I post my speed test result here to help someone to estimate the qps with current (2018-09) mysql database and machine. In my test the data size is less than server memory (that dramatically reduces IO and enhances performance a lot).

I use a one cpu 3.75GB memory, 100GB ssd, gcp cloud mysql server instance and get:

  • 1 client, one sql one row read: 799 sql/second.
  • 50 clients, one sql one row read: 6403 sql/second.
  • 50 clients, one sql one row write: 4341 rows written, qps. 4341 sql/second.
  • 1 client, 30k row write per sql: 92109 written rows/s .
wazoox
  • 6,918
  • 4
  • 31
  • 63
bronze man
  • 121
  • 3
  • 1
    write qps test result (2018-11) gcp mysql 2cpu 7.5GB memory 150GB ssd serialization write 10 threads, 30k row write per sql, 7.0566GB table, the data key length is 45 bytes and value length is 9 bytes , get 154KB written rows per second, cpu 97.1% write qps 1406/s in gcp console. – bronze man Nov 20 '18 at 06:58
0

Judging by your comments, the biggest factor will be your data set size, or at least the size of the "hot" data set. 3,000qps or even 8,000qps on a 16-core server is not a problem at all as long as the server rarely has to go to the disk to satisfy the query. Once the active data set exceeds the amount of memory InnoDB is using to cache it, your performance will drop off rapidly.

Elliott
  • 328
  • 2
  • 4
0

For large "hot" datasets, it is probably worth the investment in time to convert to a "big data" scheme, it is what they are for. For example, if you have a vast quantity of data to retrieve, but you never rewrite, but only append new data, look at Apache Hive. Browse around, their is usually a flavor you can interface easily enough to existing code, that will also prevent the heartburn of running out of cache space.