This is a controversial issue for sure, since many might want to recommend the simple things right away, like: Split up the tables! Split up the read/write into master/slave configurations! Amp up the server ram! And so forth .... let me explain the issue first though:
I have a somewhat powerful server: 8GHz, 160GB Storage, 8GB RAM (16GB Flexi RAM), RAID 10, 16GB Flexi-SSD. Running mySQL, PHP, Apache, Debian.
My current database consists of about 16 tables, where one in particular contains 1.7GB of information, with 23 million rows (indexed).
I run a service that requires daily, sometimes hourly scans of data I receive through a third party and which produces between 100 new rows per minute to a maximum of 5000 rows per minute or so (rarely). The data is fetched through a crawler getting it from an API and these crawlers run automated, scheduled and sometimes ad-hoc, so they are write-heavy to the master.
When people use the site, there will be refreshed queries available to show them their latest analytics data, so this is, when a lot of people are logged in, extremely read-heavy (I worked with slow queries and tried to reduce everything with indexes where I could). I produce these analytics on the fly out of the DB (they are a maximum of 24 hours old) and can consist of up to 5 million records summed up per user. I don't think it would make sense to pre-render these queries, since I'd have to account for all the slicing / filtering as well somehow in the pre-rendered HTML files.... right? Or do people do this?
Now, sometimes, I receive warnings on my phone, log in to the server only to find out that mySQL is down. I'll do a mysqlcheck and repair, which takes up to 2 hours or longer and finally quits with a working database. I start everything and all is happy again. I never find out why this happens, mostly though it happens when a blog writes about the site and people just go nuts and attack the site with sign-ups. But no detailed log on where it crashed and went down.
Other than rate-limiting the sign-up process (waiting line), is there anything I can do to make sure that whatever happens, MYSQL should NOT CRASH? Can I run sort of an auto-repair and optimze on a live instance hourly or so? I'd assume this blocks all access to the tables, which would be terrible?
I am really overwhelmed by this. I split up read/write and could theoretically split up all read access users to slave servers on EC2 instances. But then I have the problem of usage spikes going up and down dramatically and once I need a new EC2 instance, it requires me to transfer up to 2GB of data to sync the slave database ... which never works through the mysql-bin log if I decide to shut down / boot up an EC2 instance with a multiple day pause.
I have been able to keep up well up until know but even with EC2 and other technology at hand, I am not at the limit of my understanding and technical ability.
I'd love to share ALL the information required to make this a useful thread / document for later. Since not every website is a youtube/youporn/instagram/tumblr type of environment, I feel there's too little information for my type of site (high write/read, 500 to 5M records per user, at 3000-10000 users.
Thanks everyone, ask away and I will provide more information. I would love to hear your best-practices.