-1

First, I hope this fits here better than it fits at stackoverflow, I was torn on which site to put it on. Stack focuses more on programming. Mods, if this is the incorrect location please relocate.

Greetings everyone! I am currently working on a project where I am consuming vast amounts of both structured and unstructured data. The data is initially going into Splunk which serves as both a parsing and alerting engine. However as this project has grown it is becoming apparent that Splunk is not the location where we need to keep data. It also doesn't allow some of the features which a traditional RDBMS provides. So, we would like to attach a load balanced/HA MySql cluster to our current server complex.

What I am looking for is some sort of formula or resource page that can explain how many servers to purchase to be able to handle approximately >50 million records a day, assuming each record is ~1kb in size (this is a high estimate). I haven't been able to find any clear explanation of server scaling/capacity for MySql. That gives me a conversation point to start discussing licensing with MySql. For instance in splunk per 100GB of data a day we need 2 servers (estimating that we want each server at a max of 50% load). I haven't been able to find anything similar for MySql.

Sadly I'm not very experienced in setting up a traditional RDBMS, and most of the books I've found so far on MySql are from more of a user standpoint than an administrative and capacity planning enterprise perspective.

Any advice would really be appreciated, thank you!

Matthew
  • 2,737
  • 8
  • 35
  • 51

1 Answers1

1

MySQL Server processing capacity is generally based on the number and type of transactions being run, not size of the tables. The size of the tables just determines your disk space requirements, which you are probably going to want to put on some shared storage anyway.

As far as the actual processing capacity needed per transaction: this depends so heavily on the type of transaction that there is no golden formula. Your only real choice is to make a best guess based upon the current usage of similar databases in your organization, and be prepared to add capacity if (when) you're wrong.

For planning the storage capacity, your bottleneck will more than likely be IOPS, not actual space. You can usually approach any storage vendor with an estimated IOPS load and space requirement and they'll be able to whip something up for you.

Hyppy
  • 15,608
  • 1
  • 38
  • 59