First, I hope this fits here better than it fits at stackoverflow, I was torn on which site to put it on. Stack focuses more on programming. Mods, if this is the incorrect location please relocate.
Greetings everyone! I am currently working on a project where I am consuming vast amounts of both structured and unstructured data. The data is initially going into Splunk which serves as both a parsing and alerting engine. However as this project has grown it is becoming apparent that Splunk is not the location where we need to keep data. It also doesn't allow some of the features which a traditional RDBMS provides. So, we would like to attach a load balanced/HA MySql cluster to our current server complex.
What I am looking for is some sort of formula or resource page that can explain how many servers to purchase to be able to handle approximately >50 million records a day, assuming each record is ~1kb in size (this is a high estimate). I haven't been able to find any clear explanation of server scaling/capacity for MySql. That gives me a conversation point to start discussing licensing with MySql. For instance in splunk per 100GB of data a day we need 2 servers (estimating that we want each server at a max of 50% load). I haven't been able to find anything similar for MySql.
Sadly I'm not very experienced in setting up a traditional RDBMS, and most of the books I've found so far on MySql are from more of a user standpoint than an administrative and capacity planning enterprise perspective.
Any advice would really be appreciated, thank you!